Patent · US Active

System and method for lip-syncing a face to target speech using a machine learning model

US12154548B2 · kind B2 · utility

0Cited by

1References

15Claims

0Family size

Assignee

INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY, HYDERABAD · IN

Inventors

C.V. Jawahar · Hyderabad, IN
Rudrabha Mukhopadhyay · Hyderabad, IN
K R Prajwal · Hyderabad, IN
Vinay Namboodiri · Mumbai, IN

Key dates

Filing date	Jan 1, 2022
Grant date	Nov 26, 2024
Priority date	—
Expiry date	Jan 29, 2043

Classification

Technology area (CPC G)Physics
CPC primaryG10L2021/105
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A processor-implemented method for generating a lip-sync for a face to a target speech of a live session to a speech in one or more languages in-sync with improved visual quality using a machine learning model and a pre-trained lip-sync model is provided. The method includes (i) determining a visual representation of the face and an audio representation, the visual representation includes crops of the face; (ii) modifying the crops of the face to obtain masked crops; (iii) obtaining a reference frame from the visual representation at a second timestamp; (iv) combining the masked crops at the first timestamp with the reference to obtain lower half crops; (v) training the machine learning model by providing historical lower half crops and historical audio representations as training data; (vi) generating lip-synced frames for the face to the target speech, and (vii) generating an in-sync lip-synced frames by the pre-trained lip-sync model.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.