Patent · US Active

Text-to-speech using duration prediction

US12100382B2 · kind B2 · utility

0Cited by

3References

20Claims

0Family size

Assignee

Google LLC · US

Inventors

Yu Zhang · Mountain View, US
Isaac Elias · Mountain View, US
Byungha Chun · Warrington, GB
Ye Jia · Santa Clara, US
Yonghui Wu · Fremont, US
Mike Chrzanowski · Sunnyvale, US
Jonathan Shen · Mountain View, US

Key dates

Filing date	Oct 1, 2021
Grant date	Sep 24, 2024
Priority date	—
Expiry date	Dec 5, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG10L2013/105
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, synthesizing audio data from text data using duration prediction. One of the methods includes processing an input text sequence that includes a respective text element at each of multiple input time steps using a first neural network to generate a modified input sequence comprising, for each input time step, a representation of the corresponding text element in the input text sequence; processing the modified input sequence using a second neural network to generate, for each input time step, a predicted duration of the corresponding text element in the output audio sequence; upsampling the modified input sequence according to the predicted durations to generate an intermediate sequence comprising a respective intermediate element at each of a plurality of intermediate time steps; and generating an output audio sequence using the intermediate sequence.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.