Patent · US Active

Systems and methods for robust speech recognition using generative adversarial networks

US10971142B2 · kind B2 · utility

3Cited by

1References

20Claims

0Family size

Assignee

BAIDU USA LLC · US

Inventors

Anuroop Sriram · Sunnyvale, US
Hee Woo Jun · Sunnyvale, US
Yashesh GAUR · Redmond, US
Sanjeev Satheesh · Sunnyvale, US

Key dates

Filing date	Oct 8, 2018
Grant date	Apr 6, 2021
Priority date	—
Expiry date	Mar 29, 2039

Classification

Technology area (CPC G)Physics
CPC primaryG10L2015/0631
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Described herein are systems and methods for a general, scalable, end-to-end framework that uses a generative adversarial network (GAN) objective to enable robust speech recognition. Encoders trained with the proposed approach enjoy improved invariance by learning to map noisy audio to the same embedding space as that of clean audio. Embodiments of a Wasserstein GAN framework increase the robustness of seq-to-seq models in a scalable, end-to-end fashion. In one or more embodiments, an encoder component is treated as the generator of GAN and is trained to produce indistinguishable embeddings between labeled and unlabeled audio samples. This new robust training approach can learn to induce robustness without alignment or complicated inference pipeline and even where augmentation of audio data is not possible.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.