Patent · US Active

Fast emit low-latency streaming ASR with sequence-level emission regularization utilizing forward and backward probabilities between nodes of an alignment lattice

US12094453B2 · kind B2 · utility

0Cited by

2References

24Claims

0Family size

Assignee

Google LLC · US

Inventors

Jiahui Yu · Champaign, US
Chung-Cheng Chiu · Mountain View, US
Bo Li · 东风镇, CN
Shuo-yiin Chang · Sunnyvale, US
Tara N. Sainath · Jersey City, US
Wei Han · Shanghai, CN
Anmol Gulati · New Delhi, IN
Yanzhang He · Mountain View, US
Arun Narayanan · Rochester Hills, US
Yonghui Wu · Fremont, US
Ruoming Pang · New York, US

Key dates

Filing date	Sep 9, 2021
Grant date	Sep 17, 2024
Priority date	—
Expiry date	Sep 25, 2042

Classification

Technology area (CPC G)Physics
CPC primaryG10L15/187
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A computer-implemented method of training a streaming speech recognition model that includes receiving, as input to the streaming speech recognition model, a sequence of acoustic frames. The streaming speech recognition model is configured to learn an alignment probability between the sequence of acoustic frames and an output sequence of vocabulary tokens. The vocabulary tokens include a plurality of label tokens and a blank token. At each output step, the method includes determining a first probability of emitting one of the label tokens and determining a second probability of emitting the blank token. The method also includes generating the alignment probability at a sequence level based on the first probability and the second probability. The method also includes applying a tuning parameter to the alignment probability at the sequence level to maximize the first probability of emitting one of the label tokens.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.