Multi-speaker neural text-to-speech synthesis
US12266342B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 11, 2018 |
| Grant date | Apr 1, 2025 |
| Priority date | — |
| Expiry date | Sep 24, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L13/047
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for generating speech through multi-speaker neural text-to-speech (TTS) synthesis is provided. A text input may be received (1410). Speaker latent space information of a target speaker may be provided through at least one speaker model (1420). At least one acoustic feature may be predicted through an acoustic feature predictor based on the text input and the speaker latent space information (1430). A speech waveform corresponding to the text input may be generated through a neural vocoder based on the at least one acoustic feature and the speaker latent space information (1440).
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.