Patent · US Active

Minimum word error rate training for attention-based sequence-to-sequence models

US11646019B2 · kind B2 · utility

5Cited by

1References

18Claims

0Family size

Assignee

Google LLC · US

Inventors

Rohit Prakash Prabhavalkar · Santa Clara, US
Tara N. Sainath · Jersey City, US
Yonghui Wu · Fremont, US
Patrick Nguyen · Kirkland, US
Zhifeng Chen · Sunnyvale, US
Chung-Cheng Chiu · Mountain View, US
Anjuli Patricia Kannan · Berkeley, US

Key dates

Filing date	Jul 27, 2021
Grant date	May 9, 2023
Priority date	—
Expiry date	Jul 27, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG10L2015/025
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer-readable storage media, for speech recognition using attention-based sequence-to-sequence models. In some implementations, audio data indicating acoustic characteristics of an utterance is received. A sequence of feature vectors indicative of the acoustic characteristics of the utterance is generated. The sequence of feature vectors is processed using a speech recognition model that has been trained using a loss function that uses N-best lists of decoded hypotheses, the speech recognition model including an encoder, an attention module, and a decoder. The encoder and decoder each include one or more recurrent neural network layers. A sequence of output vectors representing distributions over a predetermined set of linguistic units is obtained. A transcription for the utterance is obtained based on the sequence of output vectors. Data indicating the transcription of the utterance is provided.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.