Patent · US Active

GAN-based speech synthesis model and training method

US11817079B1 · kind B1 · utility

0Cited by

0References

7Claims

0Family size

Assignee

NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD. · CN

Inventors

Huapeng Sima · 安丰镇, CN
Zhiqiang Mao · Nanjing, CN

Key dates

Filing date	Jun 16, 2023
Grant date	Nov 14, 2023
Priority date	—
Expiry date	Jun 16, 2043

Classification

Technology area (CPC G)Physics
CPC primaryG10L2013/083
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

The present disclosure provides a GAN-based speech synthesis model, a training method, and a speech synthesis method. According to the speech synthesis method, to-be-converted text is obtained and is converted into a text phoneme, the text phoneme is further digitized to obtain text data, and the text data is converted into a text vector to be input into a speech synthesis model. In this way, target audio corresponding to the to-be-converted text is obtained. When a target Mel-frequency spectrum is generated by using a trained generator, accuracy of the generated target Mel-frequency spectrum can reach that of a standard Mel-frequency spectrum. Through constant adversary between the generator and a discriminator and the trainings thereof, acoustic losses of the target Mel-frequency spectrum are reduced, and acoustic losses of the target audio generated based on the target Mel-frequency spectrum are also reduced, thereby improving accuracy of audio synthesized from speech.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.