Patent · US Active

Multi-speaker neural text-to-speech

US11651763B2 · kind B2 · utility

1Cited by
12References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 2, 2020
Grant dateMay 16, 2023
Priority date
Expiry dateJan 23, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L25/30
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Described herein are systems and methods for augmenting neural speech synthesis networks with low-dimensional trainable speaker embeddings in order to generate speech from different voices from a single model. As a starting point for multi-speaker experiments, improved single-speaker model embodiments, which may be referred to generally as Deep Voice 2 embodiments, were developed, as well as a post-processing neural vocoder for Tacotron (a neural character-to-spectrogram model). New techniques for multi-speaker speech synthesis were performed for both Deep Voice 2 and Tacotron embodiments on two multi-speaker TTS datasets—showing that neural text-to-speech systems can learn hundreds of unique voices from twenty-five minutes of audio per speaker.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.