Patent · US Active

Generating diverse and natural text-to-speech samples

US11475874B2 · kind B2 · utility

0Cited by

0References

20Claims

0Family size

Assignee

Google LLC · US

Inventors

Yu Zhang · Mountain View, US
Bhuvana Ramabhadran · Campion Road, US
Andrew Rosenberg · Brooklyn, US
Yonghui Wu · Fremont, US
Byungha Chun · Warrington, GB
Ron J. Weiss · New York, US
Yuan Cao · Holliston, US

Key dates

Filing date	Jan 29, 2021
Grant date	Oct 18, 2022
Priority date	—
Expiry date	Apr 18, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG10L2015/0631
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method of generating diverse and natural text-to-speech (TTS) samples includes receiving a text and generating a speech sample based on the text using a TTS model. A training process trains the TTS model to generate the speech sample by receiving training samples. Each training sample includes a spectrogram and a training text corresponding to the spectrogram. For each training sample, the training process identifies speech units associated with the training text. For each speech unit, the training process generates a speech embedding, aligns the speech embedding with a portion of the spectrogram, extracts a latent feature from the aligned portion of the spectrogram, and assigns a quantized embedding to the latent feature. The training process generates the speech sample by decoding a concatenation of the speech embeddings and a quantized embeddings for the speech units associated with the training text corresponding to the spectrogram.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.