Patent · US Active

Expressive text-to-speech utilizing contextual word-level style tokens

US11322133B2 · kind B2 · utility

0Cited by
2References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 21, 2020
Grant dateMay 3, 2022
Priority date
Expiry dateJul 24, 2040

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L25/30
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate expressive audio for input texts based on a word-level analysis of the input text. For example, the disclosed systems can utilize a multi-channel neural network to generate a character-level feature vector and a word-level feature vector based on a plurality of characters of an input text and a plurality of words of the input text, respectively. In some embodiments, the disclosed systems utilize the neural network to generate the word-level feature vector based on contextual word-level style tokens that correspond to style features associated with the input text. Based on the character-level and word-level feature vectors, the disclosed systems can generate a context-based speech map. The disclosed systems can utilize the context-based speech map to generate expressive audio for the input text.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.