Patent · US Active

Joint heterogeneous language-vision embeddings for video tagging and search

US11409791B2 · kind B2 · utility

3Cited by
28References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 12, 2017
Grant dateAug 9, 2022
Priority date
Expiry dateOct 29, 2038

Classification

  • Technology area (CPC H)Electricity
  • CPC primaryH04N21/8405
  • WIPO fieldAudio-visual technology
  • WIPO sectorElectrical engineering

Abstract

Systems, methods and articles of manufacture for modeling a joint language-visual space. A textual query to be evaluated relative to a video library is received from a requesting entity. The video library contains a plurality of instances of video content. One or more instances of video content from the video library that correspond to the textual query are determined, by analyzing the textual query using a data model that includes a soft-attention neural network module that is jointly trained with a language Long Short-term Memory (LSTM) neural network module and a video LSTM neural network module. At least an indication of the one or more instances of video content is returned to the requesting entity.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.