Patent · US Active

Audio-visual separation of on-screen sounds based on machine learning models

US12217768B2 · kind B2 · utility

0Cited by

3References

20Claims

0Family size

Assignee

Google LLC · US

Inventors

Efthymios Tzinis · Urbana, US
Scott Wisdom · Boston, US
Aren Jansen · Mountain View, US
John R. Hershey · Winchester, US

Key dates

Filing date	Jul 26, 2023
Grant date	Feb 4, 2025
Priority date	—
Expiry date	Jul 26, 2043

Classification

Technology area (CPC G)Physics
CPC primaryG06N3/0895
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Apparatus and methods related to separation of audio sources are provided. The method includes receiving an audio waveform associated with a plurality of video frames. The method includes estimating, by a neural network, one or more audio sources associated with the plurality of video frames. The method includes generating, by the neural network, one or more audio embeddings corresponding to the one or more estimated audio sources. The method includes determining, based on the audio embeddings and a video embedding, whether one or more audio sources of the one or more estimated audio sources correspond to objects in the plurality of video frames. The method includes predicting, by the neural network and based on the one or more audio embeddings and the video embedding, a version of the audio waveform comprising audio sources that correspond to objects in the plurality of video frames.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.