Systems and methods for robust speech recognition using generative adversarial networks
US10971142B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Oct 8, 2018 |
| Grant date | Apr 6, 2021 |
| Priority date | — |
| Expiry date | Mar 29, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L2015/0631
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Described herein are systems and methods for a general, scalable, end-to-end framework that uses a generative adversarial network (GAN) objective to enable robust speech recognition. Encoders trained with the proposed approach enjoy improved invariance by learning to map noisy audio to the same embedding space as that of clean audio. Embodiments of a Wasserstein GAN framework increase the robustness of seq-to-seq models in a scalable, end-to-end fashion. In one or more embodiments, an encoder component is treated as the generator of GAN and is trained to produce indistinguishable embeddings between labeled and unlabeled audio samples. This new robust training approach can learn to induce robustness without alignment or complicated inference pipeline and even where augmentation of audio data is not possible.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.