Patent · US Active

Enhancing hybrid self-attention structure with relative-position-aware bias for speech synthesis

US11011154B2 · kind B2 · utility

1Cited by

0References

17Claims

0Family size

Assignee

TENCENT AMERICA LLC · US

Inventors

Shan Yang · Nanhu, CN
Heng Lu · Sammamish, US
Shiyin Kang · Nanhu, CN
Dong Yu · Bellevue, US

Key dates

Filing date	Feb 8, 2019
Grant date	May 18, 2021
Priority date	—
Expiry date	Jul 18, 2039

Classification

Technology area (CPC G)Physics
CPC primaryG10L13/07
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method of performing speech synthesis, includes encoding character embeddings, using any one or any combination of convolutional neural networks (CNNs) and recurrent neural networks (RNNs), applying a relative-position-aware self attention function to each of the character embeddings and an input mel-scale spectrogram, and encoding the character embeddings to which the relative-position-aware self attention function is applied. The method further includes concatenating the encoded character embeddings and the encoded character embeddings to which the relative-position-aware self attention function is applied, to generate an encoder output, applying a multi-head attention function to the encoder output and the input mel-scale spectrogram to which the relative-position-aware self attention function is applied, and predicting an output mel-scale spectrogram, based on the encoder output and the input mel-scale spectrogram to which the multi-head attention function is applied.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.