Patent · US Active

Audio-speech driven animated talking face generation using a cascaded generative adversarial network

US11551394B2 · kind B2 · utility

2Cited by

2References

12Claims

0Family size

Assignee

Tata Consultancy Services Limited · IN

Inventors

Sandika Biswas · Sherghati, IN
Dipanjan Das · Jersey City, US
Sanjana Sinha · Sherghati, IN
Brojeshwar Bhowmick · Sherghati, IN

Key dates

Filing date	Mar 11, 2021
Grant date	Jan 10, 2023
Priority date	—
Expiry date	Sep 3, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG10L25/30
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Conventional state-of-the-art methods are limited in their ability to generate realistic animation from audio on any unknown faces and cannot be easily generalized to different facial characteristics and voice accents. Further, these methods fail to produce realistic facial animation for subjects which are quite different than that of distribution of facial characteristics network has seen during training. Embodiments of the present disclosure provide systems and methods that generate audio-speech driven animated talking face using a cascaded generative adversarial network (CGAN), wherein a first GAN is used to transfer lip motion from canonical face to person-specific face. A second GAN based texture generator network is conditioned on person-specific landmark to generate high-fidelity face corresponding to the motion. Texture generator GAN is made more flexible using meta learning to adapt to unknown subject's traits and orientation of face during inference. Finally, eye-blinks are induced in the final animation face being generated.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.