Multilingual image question answering
US10909329B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 25, 2016 |
| Grant date | Feb 2, 2021 |
| Priority date | — |
| Expiry date | Sep 11, 2038 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/08
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Embodiments of a multimodal question answering (mQA) system are presented to answer a question about the content of an image. In embodiments, the model comprises four components: a Long Short-Term Memory (LSTM) component to extract the question representation; a Convolutional Neural Network (CNN) component to extract the visual representation; an LSTM component for storing the linguistic context in an answer, and a fusing component to combine the information from the first three components and generate the answer. A Freestyle Multilingual Image Question Answering (FM-IQA) dataset was constructed to train and evaluate embodiments of the mQA model. The quality of the generated answers of the mQA model on this dataset is evaluated by human judges through a Turing Test.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.