Patent · US Active

Attention-based clockwork hierarchical variational encoder

US12080272B2 · kind B2 · utility

1Cited by
6References
28Claims
0Family size

Assignee

Inventors

Key dates

Filing dateDec 10, 2019
Grant dateSep 3, 2024
Priority date
Expiry dateSep 9, 2040

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L2013/105
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method (400) for representing an intended prosody in synthesized speech includes receiving a text utterance (310) having at least one word (240), and selecting an utterance embedding (204) for the text utterance. Each word in the text utterance has at least one syllable (230) and each syllable has at least one phoneme (220). The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration (238) of the syllable by decoding a prosodic syllable embedding (232, 234) for the syllable based on attention by an attention mechanism (340) to linguistic features (222) of each phoneme of the syllable and generating a plurality of fixed-length predicted frames (260) based on the predicted duration for the syllable.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.