Speech synthesis using deep neural networks
US8527276B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Oct 25, 2012 |
| Grant date | Sep 3, 2013 |
| Priority date | — |
| Expiry date | Oct 25, 2032 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L25/30
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method and system for is disclosed for speech synthesis using deep neural networks. A neural network may be trained to map input phonetic transcriptions of training-time text strings into sequences of acoustic feature vectors, which yield predefined speech waveforms when processed by a signal generation module. The training-time text strings may correspond to written transcriptions of speech carried in the predefined speech waveforms. Subsequent to training, a run-time text string may be translated to a run-time phonetic transcription, which may include a run-time sequence of phonetic-context descriptors, each of which contains a phonetic speech unit, data indicating phonetic context, and data indicating time duration of the respective phonetic speech unit. The trained neural network may then map the run-time sequence of the phonetic-context descriptors to run-time predicted feature vectors, which may in turn be translated into synthesized speech by the signal generation module.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.