Attention-based clockwork hierarchical variational encoder
US12272349B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Oct 16, 2023 |
| Grant date | Apr 8, 2025 |
| Priority date | — |
| Expiry date | Oct 16, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L2013/105
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for representing an intended prosody in synthesized speech includes receiving a text utterance having at least one word, and selecting an utterance embedding for the text utterance. Each word in the text utterance has at least one syllable and each syllable has at least one phoneme. The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration of the syllable by decoding a prosodic syllable embedding for the syllable based on attention by an attention mechanism to linguistic features of each phoneme of the syllable and generating a plurality of fixed-length predicted frames based on the predicted duration for the syllable.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.