Bi-directional spatial-temporal reasoning for video-grounded dialogues
US11288438B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Feb 4, 2020 |
| Grant date | Mar 29, 2022 |
| Priority date | — |
| Expiry date | Sep 17, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/048
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Systems and methods are provided for performing a video-grounded dialogue task by a neural network model using bi-directional spatial-temporal reasoning. According to some embodiments, the systems and methods implement a dual network architecture or framework. This framework includes one network or reasoning module that learns dependencies between text and video in the direction of spatial→temporal, and another network or reasoning module that learns in the direction of temporal→spatial. The output of the multimodal reasoning modules may be combined to learn dependencies between language features in dialogues. The result joint representation is used as a contextual feature to the decoding components which allow the model to semantically generate meaningful responses to the users. In some embodiments, pointer networks are extended to the video-grounded dialogue task to allow the model to point to specific tokens from multiple source sequences to generate responses.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.