Patent · US Active

System and method for lip-syncing a face to target speech using a machine learning model

US12154548B2 · kind B2 · utility

0Cited by
1References
15Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJan 1, 2022
Grant dateNov 26, 2024
Priority date
Expiry dateJan 29, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L2021/105
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A processor-implemented method for generating a lip-sync for a face to a target speech of a live session to a speech in one or more languages in-sync with improved visual quality using a machine learning model and a pre-trained lip-sync model is provided. The method includes (i) determining a visual representation of the face and an audio representation, the visual representation includes crops of the face; (ii) modifying the crops of the face to obtain masked crops; (iii) obtaining a reference frame from the visual representation at a second timestamp; (iv) combining the masked crops at the first timestamp with the reference to obtain lower half crops; (v) training the machine learning model by providing historical lower half crops and historical audio representations as training data; (vi) generating lip-synced frames for the face to the target speech, and (vii) generating an in-sync lip-synced frames by the pre-trained lip-sync model.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.