Patent · US Active

Expressive text-to-speech utilizing contextual word-level style tokens

US11322133B2 · kind B2 · utility

0Cited by

2References

20Claims

0Family size

Assignee

Adobe Inc. · US

Inventors

Sumit Shekhar · Bengaluru, IN
Gautam Choudhary · Sri Ganganagar, IN
Abhilasha Sancheti · College Park, US
Shubhanshu Agarwal · Agra, IN
E Santhosh Kumar · Chennai, IN
Rahul Saxena · Sunnyvale, US

Key dates

Filing date	Jul 21, 2020
Grant date	May 3, 2022
Priority date	—
Expiry date	Jul 24, 2040

Classification

Technology area (CPC G)Physics
CPC primaryG10L25/30
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

The present disclosure relates to systems, methods, and non-transitory computer-readable media that generate expressive audio for input texts based on a word-level analysis of the input text. For example, the disclosed systems can utilize a multi-channel neural network to generate a character-level feature vector and a word-level feature vector based on a plurality of characters of an input text and a plurality of words of the input text, respectively. In some embodiments, the disclosed systems utilize the neural network to generate the word-level feature vector based on contextual word-level style tokens that correspond to style features associated with the input text. Based on the character-level and word-level feature vectors, the disclosed systems can generate a context-based speech map. The disclosed systems can utilize the context-based speech map to generate expressive audio for the input text.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.