Synthesizing speech from text using neural networks
US10971170B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Aug 8, 2018 |
| Grant date | Apr 6, 2021 |
| Priority date | — |
| Expiry date | Jan 8, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/047
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.