Patent · US Active

Minimum word error rate training for attention-based sequence-to-sequence models

US11646019B2 · kind B2 · utility

5Cited by
1References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 27, 2021
Grant dateMay 9, 2023
Priority date
Expiry dateJul 27, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L2015/025
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses N-best lists of decoded hypotheses, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.