Patent · US Active

GAN-based speech synthesis model and training method

US11817079B1 · kind B1 · utility

0Cited by
0References
7Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 16, 2023
Grant dateNov 14, 2023
Priority date
Expiry dateJun 16, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L2013/083
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

The present disclosure provides a GAN-based speech synthesis model, a training method, and a speech synthesis method. According to the speech synthesis method, to-be-converted text is obtained and is converted into a text phoneme, the text phoneme is further digitized to obtain text data, and the text data is converted into a text vector to be input into a speech synthesis model. In this way, target audio corresponding to the to-be-converted text is obtained. When a target Mel-frequency spectrum is generated by using a trained generator, accuracy of the generated target Mel-frequency spectrum can reach that of a standard Mel-frequency spectrum. Through constant adversary between the generator and a discriminator and the trainings thereof, acoustic losses of the target Mel-frequency spectrum are reduced, and acoustic losses of the target audio generated based on the target Mel-frequency spectrum are also reduced, thereby improving accuracy of audio synthesized from speech.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.