Patent · US Active

Spatial-temporal reasoning through pretrained language models for video-grounded dialogues

US11487999B2 · kind B2 · utility

2Cited by

15References

20Claims

0Family size

Assignee

Salesforce.com, Inc. · US

Inventors

Hung Le · Singapore, SG
Chu Hong Hoi · Singapore, SG

Key dates

Filing date	Apr 28, 2020
Grant date	Nov 1, 2022
Priority date	—
Expiry date	Apr 29, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG06F40/30
WIPO fieldAudio-visual technology
WIPO sectorElectrical engineering

Abstract

A system and method for generating a response in a video grounded dialogue are provided. A video-grounded dialogue neural network language model receives video input and text input. The text input includes a dialogue history between the model and a human user and a current utterance by the user. Encoded video input is generated using video encoding layers. Encoded text input is generated using text encoding layers. The encoded video input and the encoded text input are concatenated in to a single input sequence. A generative pre-trained transformer model generates the response to the current utterance from the singe input sequence.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.