Speech denoising via discrete representation learning
US11875809B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Oct 1, 2020 |
| Grant date | Jan 16, 2024 |
| Priority date | — |
| Expiry date | Jun 23, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N5/01
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Developed and presented herein are embodiments of a new end-to-end approach for audio denoising, from a synthesis perspective. Instead of explicitly modelling the noise component in the input signal, embodiments directly synthesize the denoised audio from a generative model (or vocoder), as in text-to-speech systems. In one or more embodiments, to generate the phonetic contents for the autoregressive generative model, it is learned via a variational autoencoder with discrete latent representations. Furthermore, in one or more embodiments, a new matching loss is presented for the denoising purpose, which is masked on when the corresponding latent codes differ. As compared against other method on test datasets, embodiments achieve competitive performance and can be trained from scratch.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.