Patent · US Active

Two-level text-to-speech systems using synthetic training data

US12260851B2 · kind B2 · utility

0Cited by

1References

27Claims

0Family size

Assignee

Google LLC · US

Inventors

Lev Finkelstein · Netanya, IL
Chun-an Chan · Mountain View, US
Byungha Chun · Warrington, GB
Norman Casagrande · London, GB
Yu Zhang · Mountain View, US
Robert Andrew James Clark · Stapleford, GB
Vincent Wan · Cambridge, GB

Key dates

Filing date	Jul 14, 2021
Grant date	Mar 25, 2025
Priority date	—
Expiry date	Jan 18, 2042

Classification

Technology area (CPC G)Physics
CPC primaryG10L13/047
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method includes obtaining training data including a plurality of training audio signals and corresponding transcripts. Each training audio signal is spoken by a target speaker in a first accent/dialect. For each training audio signal of the training data, the method includes generating a training synthesized speech representation spoken by the target speaker in a second accent/dialect different than the first accent/dialect and training a text-to-speech (TTS) system based on the corresponding transcript and the training synthesized speech representation. The method also includes receiving an input text utterance to be synthesized into speech in the second accent/dialect. The method also includes obtaining conditioning inputs that include a speaker embedding and an accent/dialect identifier that identifies the second accent/dialect. The method also includes generating an output audio waveform corresponding to a synthesized speech representation of the input text sequence that clones the voice of the target speaker in the second accent/dialect.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.