Enhancing hybrid self-attention structure with relative-position-aware bias for speech synthesis
US11011154B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Feb 8, 2019 |
| Grant date | May 18, 2021 |
| Priority date | — |
| Expiry date | Jul 18, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L13/07
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method of performing speech synthesis, includes encoding character embeddings, using any one or any combination of convolutional neural networks (CNNs) and recurrent neural networks (RNNs), applying a relative-position-aware self attention function to each of the character embeddings and an input mel-scale spectrogram, and encoding the character embeddings to which the relative-position-aware self attention function is applied. The method further includes concatenating the encoded character embeddings and the encoded character embeddings to which the relative-position-aware self attention function is applied, to generate an encoder output, applying a multi-head attention function to the encoder output and the input mel-scale spectrogram to which the relative-position-aware self attention function is applied, and predicting an output mel-scale spectrogram, based on the encoder output and the input mel-scale spectrogram to which the multi-head attention function is applied.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.