Minimum word error rate training for attention-based sequence-to-sequence models
US11107463B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Aug 1, 2019 |
| Grant date | Aug 31, 2021 |
| Priority date | — |
| Expiry date | Feb 3, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L2015/025
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses N-best lists of decoded hypotheses, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.