Patent · US Active

Clockwork hierarchical variational encoder

US10923107B2 · kind B2 · utility

2Cited by
1References
26Claims
0Family size

Assignee

Inventors

Key dates

Filing dateApr 12, 2019
Grant dateFeb 16, 2021
Priority date
Expiry dateAug 19, 2039

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L2013/105
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method for representing an intended prosody in synthesized speech includes receiving a text utterance having at least one word, and selecting an utterance embedding for the text utterance. Each word in the text utterance has at least one syllable and each syllable has at least one phoneme. The utterance embedding represents an intended prosody. For each syllable, using the selected utterance embedding, the method also includes: predicting a duration of the syllable by encoding linguistic features of each phoneme of the syllable with a corresponding prosodic syllable embedding for the syllable; predicting a pitch contour of the syllable based on the predicted duration for the syllable; and generating a plurality of fixed-length predicted pitch frames based on the predicted duration for the syllable. Each fixed-length predicted pitch frame represents part of the predicted pitch contour of the syllable.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.