Patent · US Active

Jointly modeling embedding and translation to bridge video and language

US9807473B2 · kind B2 · utility

14Cited by

4References

20Claims

0Family size

Assignee

MICROSOFT TECHNOLOGY LICENSING, LLC · US

Inventors

Tao Mei · Beijing, CN
Ting Yao · Beijing, CN
Yong Rui · Sammamish, US

Key dates

Filing date	Nov 20, 2015
Grant date	Oct 31, 2017
Priority date	—
Expiry date	Nov 20, 2035

Classification

Technology area (CPC H)Electricity
CPC primaryH04N21/26603
WIPO fieldAudio-visual technology
WIPO sectorElectrical engineering

Abstract

Video description generation using neural network training based on relevance and coherence is described. In some examples, long short-term memory with visual-semantic embedding (LSTM-E) can maximize the probability of generating the next word given previous words and visual content and can create a visual-semantic embedding space for enforcing the relationship between the semantics of an entire sentence and visual content. LSTM-E can include a 2-D and/or 3-D deep convolutional neural networks for learning powerful video representation, a deep recurrent neural network for generating sentences, and a joint embedding model for exploring the relationships between visual content and sentence semantics.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.