Long-context end-to-end speech recognition system
US11978435B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Oct 13, 2020 |
| Grant date | May 7, 2024 |
| Priority date | — |
| Expiry date | Sep 26, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L2015/223
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
This invention relates generally to speech processing and more particularly to end-to-end automatic speech recognition (ASR) that utilizes long contextual information. Some embodiments of the invention provide a system and a method for end-to-end ASR suitable for recognizing long audio recordings such as lecture and conversational speeches. This disclosure includes a Transformer-based ASR system that utilizes contextual information, wherein the Transformer accepts multiple utterances at the same time and predicts transcript for the last utterance. This is repeated in a sliding-window fashion with one-utterance shifts to recognize the entire recording. In addition, some embodiments of the present invention may use acoustic and/or text features obtained from only the previous utterances spoken by the same speaker as the last utterance when the long audio recording includes multiple speakers.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.