Patent · US Active

Bi-directional spatial-temporal reasoning for video-grounded dialogues

US11288438B2 · kind B2 · utility

1Cited by

5References

20Claims

0Family size

Assignee

Salesforce.com, Inc. · US

Inventors

Hung Le · Singapore, SG
Chu Hong Hoi · Singapore, SG

Key dates

Filing date	Feb 4, 2020
Grant date	Mar 29, 2022
Priority date	—
Expiry date	Sep 17, 2040

Classification

Technology area (CPC G)Physics
CPC primaryG06N3/048
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Systems and methods are provided for performing a video-grounded dialogue task by a neural network model using bi-directional spatial-temporal reasoning. According to some embodiments, the systems and methods implement a dual network architecture or framework. This framework includes one network or reasoning module that learns dependencies between text and video in the direction of spatial→temporal, and another network or reasoning module that learns in the direction of temporal→spatial. The output of the multimodal reasoning modules may be combined to learn dependencies between language features in dialogues. The result joint representation is used as a contextual feature to the decoding components which allow the model to semantically generate meaningful responses to the users. In some embodiments, pointer networks are extended to the video-grounded dialogue task to allow the model to point to specific tokens from multiple source sequences to generate responses.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.