Small-footprint flow-based models for raw audio
US11521592B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Aug 5, 2020 |
| Grant date | Dec 6, 2022 |
| Priority date | — |
| Expiry date | Aug 5, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L25/30
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
WaveFlow is a small-footprint generative flow for raw audio, which may be directly trained with maximum likelihood. WaveFlow handles the long-range structure of waveform with a dilated two-dimensional (2D) convolutional architecture, while modeling the local variations using expressive autoregressive functions. WaveFlow may provide a unified view of likelihood-based models for raw audio, including WaveNet and WaveGlow, which may be considered special cases. It generates high-fidelity speech, while synthesizing several orders of magnitude faster than existing systems since it uses only a few sequential steps to generate relatively long waveforms. WaveFlow significantly reduces the likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Its small footprint with 5.91M parameters makes it 15 times smaller than some existing models. WaveFlow can generate 22.05 kHz high-fidelity audio 42.6× faster than real-time on a V100 graphics processing units (GPU) without using engineered inference kernels.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.