Patent · US Active

System and method for cross-speaker style transfer in text-to-speech and training data generation

US11600261B2 · kind B2 · utility

0Cited by

0References

20Claims

0Family size

Assignee

MICROSOFT TECHNOLOGY LICENSING, LLC · US

Inventors

Shifeng Pan · Gilroy, US
Lei He · Moraga, US
Yulin Li · Nanhu, CN
Sheng Zhao · Qingdao, CN
Chunling Ma · Beijing, CN

Key dates

Filing date	May 27, 2022
Grant date	Mar 7, 2023
Priority date	—
Expiry date	May 27, 2042

Classification

Technology area (CPC G)Physics
CPC primaryG10L2021/0135
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Systems are configured for generating spectrogram data characterized by a voice timbre of a target speaker and a prosody style of source speaker by converting a waveform of source speaker data to phonetic posterior gram (PPG) data, extracting additional prosody features from the source speaker data, and generating a spectrogram based on the PPG data and the extracted prosody features. The systems are configured to utilize/train a machine learning model for generating spectrogram data and for training a neural text-to-speech model with the generated spectrogram data.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.