Patent · US Active

Representation learning from video with spatial audio

US11308329B2 · kind B2 · utility

0Cited by

0References

18Claims

0Family size

Assignee

Adobe Inc. · US

Inventors

Justin Salamon · San Francisco, US
Bryan Russell · San Francisco, US
Karren Yang · Wynnebrook Manor, US

Key dates

Filing date	May 7, 2020
Grant date	Apr 19, 2022
Priority date	—
Expiry date	Aug 4, 2040

Classification

Technology area (CPC H)Electricity
CPC primaryH04S2420/11
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A computer system is trained to understand audio-visual spatial correspondence using audio-visual clips having multi-channel audio. The computer system includes an audio subnetwork, video subnetwork, and pretext subnetwork. The audio subnetwork receives the two channels of audio from the audio-visual clips, and the video subnetwork receives the video frames from the audio-visual clips. In a subset of the audio-visual clips the audio-visual spatial relationship is misaligned, causing the audio-visual spatial cues for the audio and video to be incorrect. The audio subnetwork outputs an audio feature vector for each audio-visual clip, and the video subnetwork outputs a video feature vector for each audio-visual clip. The audio and video feature vectors for each audio-visual clip are merged and provided to the pretext subnetwork, which is configured to classify the merged vector as either having a misaligned audio-visual spatial relationship or not. The subnetworks are trained based on the loss calculated from the classification.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.