Patent · US Active

Systems and methods for neural voice cloning with a few samples

US11238843B2 · kind B2 · utility

4Cited by
6References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 26, 2018
Grant dateFeb 1, 2022
Priority date
Expiry dateFeb 10, 2039

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L13/08
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Voice cloning is a highly desired capability for personalized speech interfaces. Neural network-based speech synthesis has been shown to generate high quality speech for a large number of speakers. Neural voice cloning systems that take a few audio samples as input are presented herein. Two approaches, speaker adaptation and speaker encoding, are disclosed. Speaker adaptation embodiments are based on fine-tuning a multi-speaker generative model with a few cloning samples. Speaker encoding embodiments are based on training a separate model to directly infer a new speaker embedding from cloning audios, which is used in or with a multi-speaker generative model. Both approaches achieve good performance in terms of naturalness of the speech and its similarity to original speaker—even with very few cloning audios.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.