Patent · US Active

Spatial-temporal reasoning through pretrained language models for video-grounded dialogues

US11487999B2 · kind B2 · utility

2Cited by
15References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateApr 28, 2020
Grant dateNov 1, 2022
Priority date
Expiry dateApr 29, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/30
  • WIPO fieldAudio-visual technology
  • WIPO sectorElectrical engineering

Abstract

A system and method for generating a response in a video grounded dialogue are provided. A video-grounded dialogue neural network language model receives video input and text input. The text input includes a dialogue history between the model and a human user and a current utterance by the user. Encoded video input is generated using video encoding layers. Encoded text input is generated using text encoding layers. The encoded video input and the encoded text input are concatenated in to a single input sequence. A generative pre-trained transformer model generates the response to the current utterance from the singe input sequence.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.