Neural-symbolic action transformers for video question answering
US12175384B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 21, 2021 |
| Grant date | Dec 24, 2024 |
| Priority date | — |
| Expiry date | Feb 11, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/0464
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Mechanisms are provided for performing artificial intelligence-based video question answering. A video parser parses an input video data sequence to generate situation data structure(s), each situation data structure comprising data elements corresponding to entities, and first relationships between entities, identified by the video parser as present in images of the input video data sequence. First machine learning computer model(s) operate on the situation data structure(s) to predict second relationship(s) between the situation data structure(s). Second machine learning computer model(s) execute on a received input question to predict an executable program to execute to answer the received question. The program is executed on the situation data structure(s) and predicted second relationship(s). An answer to the question is output based on results of executing the program.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.