Pitch-based speech conversion model training method and speech conversion system
US12300220B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 23, 2024 |
| Grant date | May 13, 2025 |
| Priority date | — |
| Expiry date | Dec 23, 2044 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L2021/0135
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
The present disclosure provides a pitch-based speech conversion model training method and a speech conversion system, wherein an audio feature code is output by a priori encoder, and a pitch feature is extracted by a pitch extraction module. A linear spectrum corresponding to the reference speech is input into the posteriori encoder to obtain an audio latent variable. In addition, the audio feature code, a speech concatenation feature obtained by concatenation of the audio feature code and the pitch feature, and the audio latent variable are input into a temporal alignment module to obtain a converted speech code, and the converted speech code is decoded by a decoder to obtain a converted speech. The training loss of the converted speech is then calculated to determine the degree of convergence of the speech conversion model.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.