Patent · US Active

Using recurrent neural network for partitioning of audio data into segments that each correspond to a speech feature cluster identifier

US10902843B2 · kind B2 · utility

0Cited by

13References

20Claims

0Family size

Assignee

International Business Machines Corporation · US

Inventors

Dimitrios Dimitriadis · Rutherford, US
David C. Haws · New York, US
Michael A. Picheny · White Plains, US
George Andrei Saon · Stamford, US
Samuel Thomas · White Plains, US

Key dates

Filing date	Nov 15, 2019
Grant date	Jan 26, 2021
Priority date	—
Expiry date	Nov 15, 2039

Classification

Technology area (CPC G)Physics
CPC primaryG10L17/18
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.