Patent · US Active

Transformer transducer: one model unifying streaming and non-streaming speech recognition

US11741947B2 · kind B2 · utility

2Cited by

0References

27Claims

0Family size

Assignee

Google LLC · US

Inventors

Anshuman Tripathi · Singapore, SG
Hasim Sak · New York, US
Han Lu · Santa Clara, US
Qian Zhang · Cypress, US
Jaeyoung Kim · Asan-si, KR

Key dates

Filing date	Mar 23, 2021
Grant date	Aug 29, 2023
Priority date	—
Expiry date	Nov 11, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG10L15/30
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A transformer-transducer model for unifying streaming and non-streaming speech recognition includes an audio encoder, a label encoder, and a joint network. The audio encoder receives a sequence of acoustic frames, and generates, at each of a plurality of time steps, a higher order feature representation for a corresponding acoustic frame. The label encoder receives a sequence of non-blank symbols output by a final softmax layer, and generates, at each of the plurality of time steps, a dense representation. The joint network receives the higher order feature representation and the dense representation at each of the plurality of time steps, and generates a probability distribution over possible speech recognition hypothesis. The audio encoder of the model further includes a neural network having an initial stack of transformer layers trained with zero look ahead audio context, and a final stack of transformer layers trained with a variable look ahead audio context.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.