Patent · US Active

Reducing streaming ASR model delay with self alignment

US12057124B2 · kind B2 · utility

0Cited by

0References

20Claims

0Family size

Assignee

Google LLC · US

Inventors

Jaeyoung Kim · Asan-si, KR
Han Lu · Santa Clara, US
Anshuman Tripathi · Singapore, SG
Qian Zhang · Cypress, US
Hasim Sak · New York, US

Key dates

Filing date	Dec 15, 2021
Grant date	Aug 6, 2024
Priority date	—
Expiry date	Dec 15, 2042

Classification

Technology area (CPC G)Physics
CPC primaryG10L15/16
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A streaming speech recognition model includes an audio encoder configured to receive a sequence of acoustic frames and generate a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The streaming speech recognition model also includes a label encoder configured to receive a sequence of non-blank symbols output by a final softmax layer and generate a dense representation. The streaming speech recognition model also includes a joint network configured to receive the higher order feature representation generated by the audio encoder and the dense representation generated by the label encoder and generate a probability distribution over possible speech recognition hypotheses. Here, the streaming speech recognition model is trained using self-alignment to reduce prediction delay by encouraging an alignment path that is one frame left from a reference forced-alignment frame.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.