System and method for streaming end-to-end speech recognition with asynchronous decoders pruning prefixes using a joint label and frame information in transcribing technique
US11373639B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 12, 2019 |
| Grant date | Jun 28, 2022 |
| Priority date | — |
| Expiry date | Nov 5, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L2015/223
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A speech recognition system successively processes each encoder state of encoded acoustic features with a frame-synchronous decoder (FSD) and label-synchronous decoder (LSD) modules. Upon identifying an encoder state carrying information about new transcription output, the system expands a current list of FSD prefixes with FSD module, evaluates the FSD prefixes with LSD module, and prunes the FSD prefixes according to joint FSD and LSD scores. FSD and LSD modules are synchronized by having LSD module to process the portion of the encoder states including new transcription output identified by the FSD module and to produce LSD scores for the FSD prefixes determined by the FSD module.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.