Systems and methods for reconstructing video data using contextually-aware multi-modal generation during signal loss
US12394405B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 24, 2023 |
| Grant date | Aug 19, 2025 |
| Priority date | — |
| Expiry date | Mar 19, 2044 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L25/60
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A device may receive video data that includes a text transcript, audio sequences, and image frames, and may detect a network fluctuation. The device may process the text transcript to generate a new phrase, and may generate a response phoneme based on the new phrase. The device may generate a text embedding based on the response phoneme, and may process the audio sequences to generate a target voice sequence. The device may generate an audio embedding based on the target voice sequence, and may process the image frames to generate a target image sequence. The device may generate an image embedding based on the target image sequence, and may combine the embeddings to generate an embedding input vector. The device may generate a final voice response and a final video based on the embedding input vector, and may provide the video data, the final voice response, and the final video.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.