Patent · US Active

Multilingual image question answering

US10909329B2 · kind B2 · utility

5Cited by

4References

20Claims

0Family size

Assignee

BAIDU USA LLC · US

Inventors

Haoyuan Gao · Beijing, CN
Junhua Mao · Palo Alto, US
Jie Zhou · Mason, US
Zhiheng Huang · Sunnyvale, US
Lei Wang · Hangzhou City, CN
Wei Xu · Santa Clara, US

Key dates

Filing date	Apr 25, 2016
Grant date	Feb 2, 2021
Priority date	—
Expiry date	Sep 11, 2038

Classification

Technology area (CPC G)Physics
CPC primaryG06N3/08
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Embodiments of a multimodal question answering (mQA) system are presented to answer a question about the content of an image. In embodiments, the model comprises four components: a Long Short-Term Memory (LSTM) component to extract the question representation; a Convolutional Neural Network (CNN) component to extract the visual representation; an LSTM component for storing the linguistic context in an answer, and a fusing component to combine the information from the first three components and generate the answer. A Freestyle Multilingual Image Question Answering (FM-IQA) dataset was constructed to train and evaluate embodiments of the mQA model. The quality of the generated answers of the mQA model on this dataset is evaluated by human judges through a Turing Test.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.