Patent · US Active

Systems and methods for reconstructing video data using contextually-aware multi-modal generation during signal loss

US12394405B2 · kind B2 · utility

0Cited by
5References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMar 24, 2023
Grant dateAug 19, 2025
Priority date
Expiry dateMar 19, 2044

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L25/60
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A device may receive video data that includes a text transcript, audio sequences, and image frames, and may detect a network fluctuation. The device may process the text transcript to generate a new phrase, and may generate a response phoneme based on the new phrase. The device may generate a text embedding based on the response phoneme, and may process the audio sequences to generate a target voice sequence. The device may generate an audio embedding based on the target voice sequence, and may process the image frames to generate a target image sequence. The device may generate an image embedding based on the target image sequence, and may combine the embeddings to generate an embedding input vector. The device may generate a final voice response and a final video based on the embedding input vector, and may provide the video data, the final voice response, and the final video.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.