Patent · US Active

Small-footprint flow-based models for raw audio

US11521592B2 · kind B2 · utility

1Cited by

0References

20Claims

0Family size

Assignee

BAIDU USA LLC · US

Inventors

Wei Ping · Sunnyvale, US
Kainan Peng · Sunnyvale, US
Kexin Zhao · Santa Clara, US
Zhao Song · Princeton, US

Key dates

Filing date	Aug 5, 2020
Grant date	Dec 6, 2022
Priority date	—
Expiry date	Aug 5, 2040

Classification

Technology area (CPC G)Physics
CPC primaryG10L25/30
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

WaveFlow is a small-footprint generative flow for raw audio, which may be directly trained with maximum likelihood. WaveFlow handles the long-range structure of waveform with a dilated two-dimensional (2D) convolutional architecture, while modeling the local variations using expressive autoregressive functions. WaveFlow may provide a unified view of likelihood-based models for raw audio, including WaveNet and WaveGlow, which may be considered special cases. It generates high-fidelity speech, while synthesizing several orders of magnitude faster than existing systems since it uses only a few sequential steps to generate relatively long waveforms. WaveFlow significantly reduces the likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Its small footprint with 5.91M parameters makes it 15 times smaller than some existing models. WaveFlow can generate 22.05 kHz high-fidelity audio 42.6× faster than real-time on a V100 graphics processing units (GPU) without using engineered inference kernels.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.