Patent · US Active

Synthesizing speech from text using neural networks

US12148444B2 · kind B2 · utility

0Cited by

4References

20Claims

0Family size

Assignee

Google LLC · US

Inventors

Yonghui Wu · Fremont, US
Jonathan Shen · Mountain View, US
Ruoming Pang · New York, US
Ron J. Weiss · New York, US
Michael Schuster · Saratoga, US
Navdeep Jaitly · Mountain View, US
Zongheng Yang · Berkeley, US
Zhifeng Chen · Sunnyvale, US
Yu Zhang · Mountain View, US
Yuxuan Wang · 安丰镇, CN
Russell John Wyatt Skerry-Ryan · Mountain View, US
Ryan M. Rifkin · Oakland, US
Ioannis Agiomyrgiannakis · London, GB

Key dates

Filing date	Apr 5, 2021
Grant date	Nov 19, 2024
Priority date	—
Expiry date	Jun 27, 2043

Classification

Technology area (CPC G)Physics
CPC primaryG06N3/047
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.