Patent · US Active

Long-context end-to-end speech recognition system

US11978435B2 · kind B2 · utility

2Cited by

3References

19Claims

0Family size

Assignee

Mitsubishi Electric Research Laboratories, Inc. · US

Inventors

Takaaki Hori · Lexington, US
Niko Moritz · Brookline, US
Chiori Hori · Koganei, JP
Jonathan Le Roux · Somerville, US

Key dates

Filing date	Oct 13, 2020
Grant date	May 7, 2024
Priority date	—
Expiry date	Sep 26, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG10L2015/223
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

This invention relates generally to speech processing and more particularly to end-to-end automatic speech recognition (ASR) that utilizes long contextual information. Some embodiments of the invention provide a system and a method for end-to-end ASR suitable for recognizing long audio recordings such as lecture and conversational speeches. This disclosure includes a Transformer-based ASR system that utilizes contextual information, wherein the Transformer accepts multiple utterances at the same time and predicts transcript for the last utterance. This is repeated in a sliding-window fashion with one-utterance shifts to recognize the entire recording. In addition, some embodiments of the present invention may use acoustic and/or text features obtained from only the previous utterances spoken by the same speaker as the last utterance when the long audio recording includes multiple speakers.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.