Patent · US Active

Synthesis of speech from text in a voice of a target speaker using neural networks

US12175963B2 · kind B2 · utility

0Cited by

0References

20Claims

0Family size

Assignee

Google LLC · US

Inventors

Ye Jia · Santa Clara, US
Zhifeng Chen · Sunnyvale, US
Yonghui Wu · Fremont, US
Jonathan Shen · Mountain View, US
Ruoming Pang · New York, US
Ron J. Weiss · New York, US
Ignacio Lopez Moreno · Brooklyn, US
Fei Ren · Beijing, CN
Yu Zhang · Mountain View, US
Quan Wang · Hoboken, US
Patrick Nguyen · Kirkland, US

Key dates

Filing date	Nov 30, 2023
Grant date	Dec 24, 2024
Priority date	—
Expiry date	Nov 30, 2043

Classification

Technology area (CPC G)Physics
CPC primaryG10L2013/021
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.