Speech detection and enhancement using audio/video fusion
US7689413B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 10, 2007 |
| Grant date | Mar 30, 2010 |
| Priority date | — |
| Expiry date | Jul 10, 2028 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L25/78
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system and method facilitating speech detection and/or enhancement utilizing audio/video fusion is provided. The present invention fuses audio and video in a probabilistic generative model that implements cross-model, self-supervised learning, enabling rapid adaptation to audio visual data. The system can learn to detect and enhance speech in noise given only a short (e.g., 30 second) sequence of audio-visual data. In addition, it automatically learns to track the lips as they move around in the video.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.