Patent · US Active

Unsupervised alignment for text to speech synthesis using neural networks

US11769481B2 · kind B2 · utility

0Cited by

1References

18Claims

0Family size

Assignee

NVIDIA Corporation · US

Inventors

Kevin Shih · Santa Clara, US
Jose Rafael Valle Gomes da Costa · Berkeley, US
Rohan Badlani · Stanford, US
Adrian Lancucki · Legnica, PL
Wei Ping · Sunnyvale, US
Bryan Catanzaro · Cupertino, US

Key dates

Filing date	Oct 7, 2021
Grant date	Sep 26, 2023
Priority date	—
Expiry date	Oct 7, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG10L2013/105
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Generation of synthetic speech from an input text sequence may be difficult when durations of individual phonemes forming the input text sequence are unknown. A predominantly parallel process may model speech rhythm as a separate generative distribution such that phoneme duration may be sampled at inference. Additional information such as pitch or energy may also be sampled to provide improved diversity for synthetic speech generation.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.