Patent · US Active

Generating diverse and natural text-to-speech samples

US11475874B2 · kind B2 · utility

0Cited by
0References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJan 29, 2021
Grant dateOct 18, 2022
Priority date
Expiry dateApr 18, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L2015/0631
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method of generating diverse and natural text-to-speech (TTS) samples includes receiving a text and generating a speech sample based on the text using a TTS model. A training process trains the TTS model to generate the speech sample by receiving training samples. Each training sample includes a spectrogram and a training text corresponding to the spectrogram. For each training sample, the training process identifies speech units associated with the training text. For each speech unit, the training process generates a speech embedding, aligns the speech embedding with a portion of the spectrogram, extracts a latent feature from the aligned portion of the spectrogram, and assigns a quantized embedding to the latent feature. The training process generates the speech sample by decoding a concatenation of the speech embeddings and a quantized embeddings for the speech units associated with the training text corresponding to the spectrogram.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.