Patent · US Active

Jointly modeling embedding and translation to bridge video and language

US9807473B2 · kind B2 · utility

14Cited by
4References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 20, 2015
Grant dateOct 31, 2017
Priority date
Expiry dateNov 20, 2035

Classification

  • Technology area (CPC H)Electricity
  • CPC primaryH04N21/26603
  • WIPO fieldAudio-visual technology
  • WIPO sectorElectrical engineering

Abstract

Video description generation using neural network training based on relevance and coherence is described. In some examples, long short-term memory with visual-semantic embedding (LSTM-E) can maximize the probability of generating the next word given previous words and visual content and can create a visual-semantic embedding space for enforcing the relationship between the semantics of an entire sentence and visual content. LSTM-E can include a 2-D and/or 3-D deep convolutional neural networks for learning powerful video representation, a deep recurrent neural network for generating sentences, and a joint embedding model for exploring the relationships between visual content and sentence semantics.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.