Systems and methods for speech transcription
US10540957B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 9, 2015 |
| Grant date | Jan 21, 2020 |
| Priority date | — |
| Expiry date | Sep 13, 2036 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L15/26
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Presented herein are embodiments of state-of-the-art speech recognition systems developed using end-to-end deep learning. In embodiments, the model architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, embodiments of the system do not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learn a function that is robust to such effects. A phoneme dictionary, nor even the concept of a “phoneme,” is needed. Embodiments include a well-optimized recurrent neural network (RNN) training system that can use multiple GPUs, as well as a set of novel data synthesis techniques that allows for a large amount of varied data for training to be efficiently obtained. Embodiments of the system can also handle challenging noisy environments better than widely used, state-of-the-art commercial speech systems.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.