Systems and methods for multi-speaker neural text-to-speech
US10896669B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | May 8, 2018 |
| Grant date | Jan 19, 2021 |
| Priority date | — |
| Expiry date | Sep 21, 2038 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L25/30
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Described herein are systems and methods for augmenting neural speech synthesis networks with low-dimensional trainable speaker embeddings in order to generate speech from different voices from a single model. As a starting point for multi-speaker experiments, improved single-speaker model embodiments, which may be referred to generally as Deep Voice 2 embodiments, were developed, as well as a post-processing neural vocoder for Tacotron (a neural character-to-spectrogram model). New techniques for multi-speaker speech synthesis were performed for both Deep Voice 2 and Tacotron embodiments on two multi-speaker TTS datasets—showing that neural text-to-speech systems can learn hundreds of unique voices from twenty-five minutes of audio per speaker.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.