Patent · US Active

Parallel tacotron non-autoregressive and controllable TTS

US11908448B2 · kind B2 · utility

0Cited by

0References

22Claims

0Family size

Assignee

Google LLC · US

Inventors

Isaac Elias · Mountain View, US
Jonathan Shen · Mountain View, US
Yu Zhang · Mountain View, US
Ye Jia · Santa Clara, US
Ron J. Weiss · New York, US
Yonghui Wu · Fremont, US
Byungha Chun · Warrington, GB

Key dates

Filing date	May 21, 2021
Grant date	Feb 20, 2024
Priority date	—
Expiry date	Jan 14, 2042

Classification

Technology area (CPC G)Physics
CPC primaryG06N3/048
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method for training a non-autoregressive TTS model includes receiving training data that includes a reference audio signal and a corresponding input text sequence. The method also includes encoding the reference audio signal into a variational embedding that disentangles the style/prosody information from the reference audio signal and encoding the input text sequence into an encoded text sequence. The method also includes predicting a phoneme duration for each phoneme in the input text sequence and determining a phoneme duration loss based on the predicted phoneme durations and a reference phoneme duration. The method also includes generating one or more predicted mel-frequency spectrogram sequences for the input text sequence and determining a final spectrogram loss based on the predicted mel-frequency spectrogram sequences and a reference mel-frequency spectrogram sequence. The method also includes training the TTS model based on the final spectrogram loss and the corresponding phoneme duration loss.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.