Patent · US Active

Systems and methods for robust speech recognition using generative adversarial networks

US10971142B2 · kind B2 · utility

3Cited by
1References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateOct 8, 2018
Grant dateApr 6, 2021
Priority date
Expiry dateMar 29, 2039

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L2015/0631
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Described herein are systems and methods for a general, scalable, end-to-end framework that uses a generative adversarial network (GAN) objective to enable robust speech recognition. Encoders trained with the proposed approach enjoy improved invariance by learning to map noisy audio to the same embedding space as that of clean audio. Embodiments of a Wasserstein GAN framework increase the robustness of seq-to-seq models in a scalable, end-to-end fashion. In one or more embodiments, an encoder component is treated as the generator of GAN and is trained to produce indistinguishable embeddings between labeled and unlabeled audio samples. This new robust training approach can learn to induce robustness without alignment or complicated inference pipeline and even where augmentation of audio data is not possible.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.