Patent · US Active

Synthesizing speech from text using neural networks

US12148444B2 · kind B2 · utility

0Cited by
4References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateApr 5, 2021
Grant dateNov 19, 2024
Priority date
Expiry dateJun 27, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N3/047
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.