Systems and methods for neural voice cloning with a few samples
US11238843B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 26, 2018 |
| Grant date | Feb 1, 2022 |
| Priority date | — |
| Expiry date | Feb 10, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L13/08
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Voice cloning is a highly desired capability for personalized speech interfaces. Neural network-based speech synthesis has been shown to generate high quality speech for a large number of speakers. Neural voice cloning systems that take a few audio samples as input are presented herein. Two approaches, speaker adaptation and speaker encoding, are disclosed. Speaker adaptation embodiments are based on fine-tuning a multi-speaker generative model with a few cloning samples. Speaker encoding embodiments are based on training a separate model to directly infer a new speaker embedding from cloning audios, which is used in or with a multi-speaker generative model. Both approaches achieve good performance in terms of naturalness of the speech and its similarity to original speaker—even with very few cloning audios.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.