Patent · US Active

Audio-visual speech separation

US11894014B2 · kind B2 · utility

0Cited by

2References

20Claims

0Family size

Assignee

Google LLC · US

Inventors

Inbar Mosseri · Raanana, IL
Michael Rubinstein · Natick, US
Ariel Ephrat · Jerusalem, IL
William T. Freeman · Cambridge, US
Oran Lang · Givatayim, IL
Kevin William Wilson · Cambridge, US
Tali Dekel · Arlington, US
Avinatan Hassidim

Key dates

Filing date	Sep 22, 2022
Grant date	Feb 6, 2024
Priority date	—
Expiry date	Sep 22, 2042

Classification

Technology area (CPC G)Physics
CPC primaryG10L21/18
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for audio-visual speech separation. A method includes: obtaining, for each frame in a stream of frames from a video in which faces of one or more speakers have been detected, a respective per-frame face embedding of the face of each speaker; processing, for each speaker, the per-frame face embeddings of the face of the speaker to generate visual features for the face of the speaker; obtaining a spectrogram of an audio soundtrack for the video; processing the spectrogram to generate an audio embedding for the audio soundtrack; combining the visual features for the one or more speakers and the audio embedding for the audio soundtrack to generate an audio-visual embedding for the video; determining a respective spectrogram mask for each of the one or more speakers; and determining a respective isolated speech spectrogram for each speaker.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.