Patent · US Active

Two-level text-to-speech systems using synthetic training data

US12260851B2 · kind B2 · utility

0Cited by
1References
27Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 14, 2021
Grant dateMar 25, 2025
Priority date
Expiry dateJan 18, 2042

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L13/047
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method includes obtaining training data including a plurality of training audio signals and corresponding transcripts. Each training audio signal is spoken by a target speaker in a first accent/dialect. For each training audio signal of the training data, the method includes generating a training synthesized speech representation spoken by the target speaker in a second accent/dialect different than the first accent/dialect and training a text-to-speech (TTS) system based on the corresponding transcript and the training synthesized speech representation. The method also includes receiving an input text utterance to be synthesized into speech in the second accent/dialect. The method also includes obtaining conditioning inputs that include a speaker embedding and an accent/dialect identifier that identifies the second accent/dialect. The method also includes generating an output audio waveform corresponding to a synthesized speech representation of the input text sequence that clones the voice of the target speaker in the second accent/dialect.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.