Speech signal enhancement using visual information
US9293151B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Oct 17, 2011 |
| Grant date | Mar 22, 2016 |
| Priority date | — |
| Expiry date | Dec 19, 2031 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L2021/02082
- WIPO fieldAudio-visual technology
- WIPO sectorElectrical engineering
Abstract
Visual information is used to alter or set an operating parameter of an audio signal processor, other than a beamformer. A digital camera captures visual information about a scene that includes a human speaker and/or a listener. The visual information is analyzed to ascertain information about acoustics of a room. A distance between the speaker and a microphone may be estimated, and this distance estimate may be used to adjust an overall gain of the system. Distances among, and locations of, the speaker, the listener, the microphone, a loudspeaker and/or a sound-reflecting surface may be estimated. These estimates may be used to estimate reverberations within the room and adjust aggressiveness of an anti-reverberation filter, based on an estimated ratio of direct to indirect (reverberated) sound energy expected to reach the microphone. In addition, orientation of the speaker or the listener, relative to the microphone or the loudspeaker, can also be estimated, and this estimate may be used to adjust frequency-dependent filter weights to compensate for uneven frequency propagation of acoustic signals from a mouth, or to a human ear, about a human head.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.