Patent · US Active

Systems and methods for neural text-to-speech using convolutional sequence learning

US10796686B2 · kind B2 · utility

11Cited by
6References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateAug 8, 2018
Grant dateOct 6, 2020
Priority date
Expiry dateOct 2, 2038

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L13/047
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Described herein are embodiments of a fully-convolutional attention-based neural text-to-speech (TTS) system, which various embodiments may generally be referred to as Deep Voice 3. Embodiments of Deep Voice 3 match state-of-the-art neural speech synthesis systems in naturalness while training ten times faster. Deep Voice 3 embodiments were scaled to data set sizes unprecedented for TTS, training on more than eight hundred hours of audio from over two thousand speakers. In addition, common error modes of attention-based speech synthesis networks were identified and mitigated, and several different waveform synthesis methods were compared. Also presented are embodiments that describe how to scale inference to ten million queries per day on one single-GPU server.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.