Patent · US Active

Systems and methods for neural voice cloning with a few samples

US11238843B2 · kind B2 · utility

4Cited by

6References

20Claims

0Family size

Assignee

BAIDU USA LLC · US

Inventors

Sercan Omer Arik · San Francisco, US
Jitong Chen · Sunnyvale, US
Kainan Peng · Sunnyvale, US
Wei Ping · Sunnyvale, US
Yanqi Zhou · San Jose, US

Key dates

Filing date	Sep 26, 2018
Grant date	Feb 1, 2022
Priority date	—
Expiry date	Feb 10, 2039

Classification

Technology area (CPC G)Physics
CPC primaryG10L13/08
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Voice cloning is a highly desired capability for personalized speech interfaces. Neural network-based speech synthesis has been shown to generate high quality speech for a large number of speakers. Neural voice cloning systems that take a few audio samples as input are presented herein. Two approaches, speaker adaptation and speaker encoding, are disclosed. Speaker adaptation embodiments are based on fine-tuning a multi-speaker generative model with a few cloning samples. Speaker encoding embodiments are based on training a separate model to directly infer a new speaker embedding from cloning audios, which is used in or with a multi-speaker generative model. Both approaches achieve good performance in terms of naturalness of the speech and its similarity to original speaker—even with very few cloning audios.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.