Patent · US Active

Systems and methods for multi-speaker neural text-to-speech

US10896669B2 · kind B2 · utility

7Cited by
6References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMay 8, 2018
Grant dateJan 19, 2021
Priority date
Expiry dateSep 21, 2038

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L25/30
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Described herein are systems and methods for augmenting neural speech synthesis networks with low-dimensional trainable speaker embeddings in order to generate speech from different voices from a single model. As a starting point for multi-speaker experiments, improved single-speaker model embodiments, which may be referred to generally as Deep Voice 2 embodiments, were developed, as well as a post-processing neural vocoder for Tacotron (a neural character-to-spectrogram model). New techniques for multi-speaker speech synthesis were performed for both Deep Voice 2 and Tacotron embodiments on two multi-speaker TTS datasets—showing that neural text-to-speech systems can learn hundreds of unique voices from twenty-five minutes of audio per speaker.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.