Multi-granularity alignment for visual question answering
US12210835B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 16, 2022 |
| Grant date | Jan 28, 2025 |
| Priority date | — |
| Expiry date | Jun 16, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V10/82
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
In one embodiment, a method includes accessing an image and a natural-language question regarding the image and extracting, from the image, a first set of image features at a first level of granularity and a second set of image features at a second level of granularity. The method further includes extracting, from the question, a first set of text features at the first level of granularity and a second set of text features at the second level of granularity; generating a first output representing an alignment between the first set of image features and the first set of text features; generating a second output representing an alignment between the second set of image features and the second set of text features; and determining an answer to the question based on the first output and the second output.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.