Patent · US Active

Localization of narrations in image data

US12118787B2 · kind B2 · utility

1Cited by

2References

19Claims

0Family size

Assignee

Adobe Inc. · US

Inventors

Hailin Jin · San Jose, US
Bryan Russell · San Francisco, US
Reuben Xin Hong Tan · San Jose, US

Key dates

Filing date	Oct 12, 2021
Grant date	Oct 15, 2024
Priority date	—
Expiry date	Oct 17, 2042

Classification

Technology area (CPC G)Physics
CPC primaryG10L25/54
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Methods, system, and computer storage media are provided for multi-modal localization. Input data comprising two modalities, such as image data and corresponding text or audio data, may be received. A phrase may be extracted from the text or audio data, and a neural network system may be utilized to spatially and temporally localize the phrase within the image data. The neural network system may include a plurality of cross-modal attention layers that each compare features across the first and second modalities without comparing features of the same modality. Using the cross-modal attention layers, a region or subset of pixels within one or more frames of the image data may be identified as corresponding to the phrase, and a localization indicator may be presented for display with the image data. Embodiments may also include unsupervised training of the neural network system.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.