Audio-speech driven animated talking face generation using a cascaded generative adversarial network
US11551394B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 11, 2021 |
| Grant date | Jan 10, 2023 |
| Priority date | — |
| Expiry date | Sep 3, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L25/30
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Conventional state-of-the-art methods are limited in their ability to generate realistic animation from audio on any unknown faces and cannot be easily generalized to different facial characteristics and voice accents. Further, these methods fail to produce realistic facial animation for subjects which are quite different than that of distribution of facial characteristics network has seen during training. Embodiments of the present disclosure provide systems and methods that generate audio-speech driven animated talking face using a cascaded generative adversarial network (CGAN), wherein a first GAN is used to transfer lip motion from canonical face to person-specific face. A second GAN based texture generator network is conditioned on person-specific landmark to generate high-fidelity face corresponding to the motion. Texture generator GAN is made more flexible using meta learning to adapt to unknown subject's traits and orientation of face during inference. Finally, eye-blinks are induced in the final animation face being generated.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.