GAN-based speech synthesis model and training method
US11817079B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 16, 2023 |
| Grant date | Nov 14, 2023 |
| Priority date | — |
| Expiry date | Jun 16, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L2013/083
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
The present disclosure provides a GAN-based speech synthesis model, a training method, and a speech synthesis method. According to the speech synthesis method, to-be-converted text is obtained and is converted into a text phoneme, the text phoneme is further digitized to obtain text data, and the text data is converted into a text vector to be input into a speech synthesis model. In this way, target audio corresponding to the to-be-converted text is obtained. When a target Mel-frequency spectrum is generated by using a trained generator, accuracy of the generated target Mel-frequency spectrum can reach that of a standard Mel-frequency spectrum. Through constant adversary between the generator and a discriminator and the trainings thereof, acoustic losses of the target Mel-frequency spectrum are reduced, and acoustic losses of the target audio generated based on the target Mel-frequency spectrum are also reduced, thereby improving accuracy of audio synthesized from speech.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.