Patent · US Active

Enhancing hybrid self-attention structure with relative-position-aware bias for speech synthesis

US11011154B2 · kind B2 · utility

1Cited by
0References
17Claims
0Family size

Assignee

Inventors

Key dates

Filing dateFeb 8, 2019
Grant dateMay 18, 2021
Priority date
Expiry dateJul 18, 2039

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L13/07
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method of performing speech synthesis, includes encoding character embeddings, using any one or any combination of convolutional neural networks (CNNs) and recurrent neural networks (RNNs), applying a relative-position-aware self attention function to each of the character embeddings and an input mel-scale spectrogram, and encoding the character embeddings to which the relative-position-aware self attention function is applied. The method further includes concatenating the encoded character embeddings and the encoded character embeddings to which the relative-position-aware self attention function is applied, to generate an encoder output, applying a multi-head attention function to the encoder output and the input mel-scale spectrogram to which the relative-position-aware self attention function is applied, and predicting an output mel-scale spectrogram, based on the encoder output and the input mel-scale spectrogram to which the multi-head attention function is applied.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.