Patent · US Active

Small-footprint flow-based models for raw audio

US11521592B2 · kind B2 · utility

1Cited by
0References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateAug 5, 2020
Grant dateDec 6, 2022
Priority date
Expiry dateAug 5, 2040

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L25/30
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

WaveFlow is a small-footprint generative flow for raw audio, which may be directly trained with maximum likelihood. WaveFlow handles the long-range structure of waveform with a dilated two-dimensional (2D) convolutional architecture, while modeling the local variations using expressive autoregressive functions. WaveFlow may provide a unified view of likelihood-based models for raw audio, including WaveNet and WaveGlow, which may be considered special cases. It generates high-fidelity speech, while synthesizing several orders of magnitude faster than existing systems since it uses only a few sequential steps to generate relatively long waveforms. WaveFlow significantly reduces the likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Its small footprint with 5.91M parameters makes it 15 times smaller than some existing models. WaveFlow can generate 22.05 kHz high-fidelity audio 42.6× faster than real-time on a V100 graphics processing units (GPU) without using engineered inference kernels.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.