Patent · US Active

Intelligent image captioning

US11593612B2 · kind B2 · utility

1Cited by

0References

20Claims

0Family size

Assignee

BAIDU USA LLC · US

Inventors

Junhua Mao · Palo Alto, US
Wei Xu · Santa Clara, US
Yi Yang · San Jose, US
Jiang Wang · Shanghai, CN
Zhiheng Huang · Sunnyvale, US

Key dates

Filing date	Aug 19, 2019
Grant date	Feb 28, 2023
Priority date	—
Expiry date	Aug 23, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG06N3/09
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Presented herein are embodiments of a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. In embodiments, it directly models the probability distribution of generating a word given a previous word or words and an image, and image captions are generated according to this distribution. In embodiments, the model comprises two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. In embodiments, these two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of an embodiment of model was validated on four benchmark datasets, and it outperformed the state-of-the-art methods. In embodiments, the m-RNN model may also be applied to retrieval tasks for retrieving images or captions.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.