Patent · US Active

Parallel tacotron non-autoregressive and controllable TTS

US11908448B2 · kind B2 · utility

0Cited by
0References
22Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMay 21, 2021
Grant dateFeb 20, 2024
Priority date
Expiry dateJan 14, 2042

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N3/048
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method for training a non-autoregressive TTS model includes receiving training data that includes a reference audio signal and a corresponding input text sequence. The method also includes encoding the reference audio signal into a variational embedding that disentangles the style/prosody information from the reference audio signal and encoding the input text sequence into an encoded text sequence. The method also includes predicting a phoneme duration for each phoneme in the input text sequence and determining a phoneme duration loss based on the predicted phoneme durations and a reference phoneme duration. The method also includes generating one or more predicted mel-frequency spectrogram sequences for the input text sequence and determining a final spectrogram loss based on the predicted mel-frequency spectrogram sequences and a reference mel-frequency spectrogram sequence. The method also includes training the TTS model based on the final spectrogram loss and the corresponding phoneme duration loss.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.