Patent · US Active

Attention-based clockwork hierarchical variational encoder

US12272349B2 · kind B2 · utility

0Cited by
19References
28Claims
0Family size

Assignee

Inventors

Key dates

Filing dateOct 16, 2023
Grant dateApr 8, 2025
Priority date
Expiry dateOct 16, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L2013/105
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method for representing an intended prosody in synthesized speech includes receiving a text utterance having at least one word, and selecting an utterance embedding for the text utterance. Each word in the text utterance has at least one syllable and each syllable has at least one phoneme. The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration of the syllable by decoding a prosodic syllable embedding for the syllable based on attention by an attention mechanism to linguistic features of each phoneme of the syllable and generating a plurality of fixed-length predicted frames based on the predicted duration for the syllable.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.