Two-level text-to-speech systems using synthetic training data
US12260851B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 14, 2021 |
| Grant date | Mar 25, 2025 |
| Priority date | — |
| Expiry date | Jan 18, 2042 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L13/047
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method includes obtaining training data including a plurality of training audio signals and corresponding transcripts. Each training audio signal is spoken by a target speaker in a first accent/dialect. For each training audio signal of the training data, the method includes generating a training synthesized speech representation spoken by the target speaker in a second accent/dialect different than the first accent/dialect and training a text-to-speech (TTS) system based on the corresponding transcript and the training synthesized speech representation. The method also includes receiving an input text utterance to be synthesized into speech in the second accent/dialect. The method also includes obtaining conditioning inputs that include a speaker embedding and an accent/dialect identifier that identifies the second accent/dialect. The method also includes generating an output audio waveform corresponding to a synthesized speech representation of the input text sequence that clones the voice of the target speaker in the second accent/dialect.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.