Patent · US Active

Deep learning models for speech recognition

US11562733B2 · kind B2 · utility

1Cited by

12References

20Claims

0Family size

Assignee

BAIDU USA LLC · US

Inventors

Awni Hannun · Los Altos, US
Carl Case · San Francisco, US
Jared Casper · Sunnyvale, US
Bryan Catanzaro · Cupertino, US
Gregory Diamos · San Jose, US
Erich Eisen · Mountain View, US
Ryan Prenger · Oakland, US
Sanjeev Satheesh · Sunnyvale, US
Shubhabrata Sengupta · Menlo Park, US
Adam Coates · Mountain View, US
Andrew Yan-Tak Ng · Palo Alto, US

Key dates

Filing date	Aug 15, 2019
Grant date	Jan 24, 2023
Priority date	—
Expiry date	Apr 13, 2040

Classification

Technology area (CPC G)Physics
CPC primaryG10L15/26
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Presented herein are embodiments of state-of-the-art speech recognition systems developed using end-to-end deep learning. In embodiments, the model architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, embodiments of the system do not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learn a function that is robust to such effects. Neither a phoneme dictionary, nor even the concept of a “phoneme,” is needed. Embodiments include a well-optimized recurrent neural network (RNN) training system that can use multiple GPUs, as well as a set of novel data synthesis techniques that allows for a large amount of varied data for training to be efficiently obtained. Embodiments of the system can also handle challenging noisy environments better than widely used, state-of-the-art commercial speech systems.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.