Patent · US Active

Spectrogram to waveform synthesis using convolutional networks

US11462209B2 · kind B2 · utility

3Cited by

0References

20Claims

0Family size

Assignee

BAIDU USA LLC · US

Inventors

Sercan Omer Arik · San Francisco, US
Hee Woo Jun · Sunnyvale, US
Eric Undersander · San Francisco, US
Gregory Diamos · San Jose, US

Key dates

Filing date	Mar 27, 2019
Grant date	Oct 4, 2022
Priority date	—
Expiry date	Feb 21, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG10L25/30
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

For the problem of waveform synthesis from spectrograms, presented herein are embodiments of an efficient neural network architecture, based on transposed convolutions to achieve a high compute intensity and fast inference. In one or more embodiments, for training of the convolutional vocoder architecture, losses are used that are related to perceptual audio quality, as well as a GAN framework to guide with a critic that discerns unrealistic waveforms. While yielding a high-quality audio, embodiments of the model can achieve more than 500 times faster than real-time audio synthesis. Multi-head convolutional neural network (MCNN) embodiments for waveform synthesis from spectrograms are also disclosed. MCNN embodiments enable significantly better utilization of modern multi-core processors than commonly-used iterative algorithms like Griffin-Lim and yield very fast (more than 300× real-time) waveform synthesis. Embodiments herein yield high-quality speech synthesis, without any iterative algorithms or autoregression in computations.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.