System and method for lip-syncing a face to target speech using a machine learning model
US12154548B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 1, 2022 |
| Grant date | Nov 26, 2024 |
| Priority date | — |
| Expiry date | Jan 29, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L2021/105
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A processor-implemented method for generating a lip-sync for a face to a target speech of a live session to a speech in one or more languages in-sync with improved visual quality using a machine learning model and a pre-trained lip-sync model is provided. The method includes (i) determining a visual representation of the face and an audio representation, the visual representation includes crops of the face; (ii) modifying the crops of the face to obtain masked crops; (iii) obtaining a reference frame from the visual representation at a second timestamp; (iv) combining the masked crops at the first timestamp with the reference to obtain lower half crops; (v) training the machine learning model by providing historical lower half crops and historical audio representations as training data; (vi) generating lip-synced frames for the face to the target speech, and (vii) generating an in-sync lip-synced frames by the pre-trained lip-sync model.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.