Patent · US Active

Synthesizing speech from text using neural networks

US10971170B2 · kind B2 · utility

1Cited by
4References
19Claims
0Family size

Assignee

Inventors

Key dates

Filing dateAug 8, 2018
Grant dateApr 6, 2021
Priority date
Expiry dateJan 8, 2039

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N3/047
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.