Controllable, natural paralinguistics for text to speech synthesis
US12361925B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 29, 2020 |
| Grant date | Jul 15, 2025 |
| Priority date | — |
| Expiry date | Dec 29, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L15/26
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A speech recognition module receives training data of speech and creates a representation for individual words, non-words, phonemes, and any combination. A set of speech processing detectors analyze the training data of speech from humans communicating. The set of speech processing detectors detect speech parameters that are indicative of paralinguistic effects on top of enunciated words, phonemes, and non-words in the audio stream. One or more machine learning models undergo supervised machine learning on their neural network to train on how to associate one or more mark-up markers with a textual representation, for each individual word, individual non-word, individual phoneme, and any combinations of these, that was enunciated with a particular paralinguistic effect. Each mark-up marker can correspond to its own paralinguistic effect.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.