Patent · US Active

Speech detection and enhancement using audio/video fusion

US7689413B2 · kind B2 · utility

11Cited by

5References

9Claims

0Family size

Assignee

Microsoft Corporation · US

Inventors

John R. Hershey · Winchester, US
Trausti Thor Kristajanson · Redmond, US
Hagai Attias · Seattle, US
Nebojsa Jojic · Redmond, US

Key dates

Filing date	Sep 10, 2007
Grant date	Mar 30, 2010
Priority date	—
Expiry date	Jul 10, 2028

Classification

Technology area (CPC G)Physics
CPC primaryG10L25/78
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A system and method facilitating speech detection and/or enhancement utilizing audio/video fusion is provided. The present invention fuses audio and video in a probabilistic generative model that implements cross-model, self-supervised learning, enabling rapid adaptation to audio visual data. The system can learn to detect and enhance speech in noise given only a short (e.g., 30 second) sequence of audio-visual data. In addition, it automatically learns to track the lips as they move around in the video.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.