Patent · US Active

Bi-directional spatial-temporal reasoning for video-grounded dialogues

US11288438B2 · kind B2 · utility

1Cited by
5References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateFeb 4, 2020
Grant dateMar 29, 2022
Priority date
Expiry dateSep 17, 2040

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N3/048
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Systems and methods are provided for performing a video-grounded dialogue task by a neural network model using bi-directional spatial-temporal reasoning. According to some embodiments, the systems and methods implement a dual network architecture or framework. This framework includes one network or reasoning module that learns dependencies between text and video in the direction of spatial→temporal, and another network or reasoning module that learns in the direction of temporal→spatial. The output of the multimodal reasoning modules may be combined to learn dependencies between language features in dialogues. The result joint representation is used as a contextual feature to the decoding components which allow the model to semantically generate meaningful responses to the users. In some embodiments, pointer networks are extended to the video-grounded dialogue task to allow the model to point to specific tokens from multiple source sequences to generate responses.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.