Patent · US Active

Long-context end-to-end speech recognition system

US11978435B2 · kind B2 · utility

2Cited by
3References
19Claims
0Family size

Assignee

Inventors

Key dates

Filing dateOct 13, 2020
Grant dateMay 7, 2024
Priority date
Expiry dateSep 26, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L2015/223
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

This invention relates generally to speech processing and more particularly to end-to-end automatic speech recognition (ASR) that utilizes long contextual information. Some embodiments of the invention provide a system and a method for end-to-end ASR suitable for recognizing long audio recordings such as lecture and conversational speeches. This disclosure includes a Transformer-based ASR system that utilizes contextual information, wherein the Transformer accepts multiple utterances at the same time and predicts transcript for the last utterance. This is repeated in a sliding-window fashion with one-utterance shifts to recognize the entire recording. In addition, some embodiments of the present invention may use acoustic and/or text features obtained from only the previous utterances spoken by the same speaker as the last utterance when the long audio recording includes multiple speakers.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.