Patent · US Active

Multi-speaker neural text-to-speech

US11651763B2 · kind B2 · utility

1Cited by

12References

20Claims

0Family size

Assignee

BAIDU USA LLC · US

Inventors

Sercan Omer Arik · San Francisco, US
Gregory Diamos · San Jose, US
Andrew Gibiansky · Mountain View, US
John Miller · Redmond, US
Kainan Peng · Sunnyvale, US
Wei Ping · Sunnyvale, US
Jonathan Raiman · Palo Alto, US
Yanqi Zhou · San Jose, US

Key dates

Filing date	Nov 2, 2020
Grant date	May 16, 2023
Priority date	—
Expiry date	Jan 23, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG10L25/30
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Described herein are systems and methods for augmenting neural speech synthesis networks with low-dimensional trainable speaker embeddings in order to generate speech from different voices from a single model. As a starting point for multi-speaker experiments, improved single-speaker model embodiments, which may be referred to generally as Deep Voice 2 embodiments, were developed, as well as a post-processing neural vocoder for Tacotron (a neural character-to-spectrogram model). New techniques for multi-speaker speech synthesis were performed for both Deep Voice 2 and Tacotron embodiments on two multi-speaker TTS datasets—showing that neural text-to-speech systems can learn hundreds of unique voices from twenty-five minutes of audio per speaker.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.