Patent · US Active

Controlling expressivity in end-to-end speech synthesis systems

US11676573B2 · kind B2 · utility

0Cited by

1References

22Claims

0Family size

Assignee

Google LLC · US

Inventors

Daisy Stanton · San Francisco, US
Eric Dean Battenberg · Sunnyvale, US
Russell John Wyatt Skerry-Ryan · Mountain View, US
Soroosh Mariooryad · Redwood City, US
David Teh-Hwa Kao · San Francisco, US
Thomas Edward Bagby · San Francisco, US
Sean Matthew Shannon · Mountain View, US

Key dates

Filing date	Jul 16, 2020
Grant date	Jun 13, 2023
Priority date	—
Expiry date	Apr 22, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG10L13/08
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A system for generating an output audio signal includes a context encoder, a text-prediction network, and a text-to-speech (TTS) model. The context encoder is configured to receive one or more context features associated with current input text and process the one or more context features to generate a context embedding associated with the current input text. The text-prediction network is configured to process the current input text and the context embedding to predict, as output, a style embedding for the current input text. The style embedding specifies a specific prosody and/or style for synthesizing the current input text into expressive speech. The TTS model is configured to process the current input text and the style embedding to generate an output audio signal of expressive speech of the current input text. The output audio signal has the specific prosody and/or style specified by the style embedding.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.