System and method for cross-speaker style transfer in text-to-speech and training data generation
US11361753B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 24, 2020 |
| Grant date | Jun 14, 2022 |
| Priority date | — |
| Expiry date | Sep 24, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L2021/0135
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Systems are configured for generating spectrogram data characterized by a voice timbre of a target speaker and a prosody style of source speaker by converting a waveform of source speaker data to phonetic posterior gram (PPG) data, extracting additional prosody features from the source speaker data, and generating a spectrogram based on the PPG data and the extracted prosody features. The systems are configured to utilize/train a machine learning model for generating spectrogram data and for training a neural text-to-speech model with the generated spectrogram data.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.