Systems and methods for attention-based configurable convolutional neural networks (ABC-CNN) for visual question answering
US9965705B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 16, 2016 |
| Grant date | May 8, 2018 |
| Priority date | — |
| Expiry date | Aug 10, 2036 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/044
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Described herein are systems and methods for generating and using attention-based deep learning architectures for visual question answering task (VQA) to automatically generate answers for image-related (still or video images) questions. To generate the correct answers, it is important for a model's attention to focus on the relevant regions of an image according to the question because different questions may ask about the attributes of different image regions. In embodiments, such question-guided attention is learned with a configurable convolutional neural network (ABC-CNN). Embodiments of the ABC-CNN models determine the attention maps by convolving image feature map with the configurable convolutional kernels determined by the questions semantics. In embodiments, the question-guided attention maps focus on the question-related regions and filters out noise in the unrelated regions.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.