Patent · US Active

Joint heterogeneous language-vision embeddings for video tagging and search

US11409791B2 · kind B2 · utility

3Cited by

28References

20Claims

0Family size

Assignee

Disney Enterprises, Inc. · US

Inventors

Atousa Torabi · Pittsburgh, US
Leonid Sigal · Burbank, US

Key dates

Filing date	Jun 12, 2017
Grant date	Aug 9, 2022
Priority date	—
Expiry date	Oct 29, 2038

Classification

Technology area (CPC H)Electricity
CPC primaryH04N21/8405
WIPO fieldAudio-visual technology
WIPO sectorElectrical engineering

Abstract

Systems, methods and articles of manufacture for modeling a joint language-visual space. A textual query to be evaluated relative to a video library is received from a requesting entity. The video library contains a plurality of instances of video content. One or more instances of video content from the video library that correspond to the textual query are determined, by analyzing the textual query using a data model that includes a soft-attention neural network module that is jointly trained with a language Long Short-term Memory (LSTM) neural network module and a video LSTM neural network module. At least an indication of the one or more instances of video content is returned to the requesting entity.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.