Duration informed attention network for text-to-speech analysis
US11468879B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 29, 2019 |
| Grant date | Oct 11, 2022 |
| Priority date | — |
| Expiry date | Jun 10, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L2013/105
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method and apparatus include receiving a text input that includes a sequence of text components. Respective temporal durations of the text components are determined using a duration model. A first set of spectra is generated based on the sequence of text components. A second set of spectra is generated based on the first set of spectra and the respective temporal durations of the sequence of text components. A spectrogram frame is generated based on the second set of spectra. An audio waveform is generated based on the spectrogram frame. The audio waveform is provided as an output.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.