Patent · US Active

Using recurrent neural network for partitioning of audio data into segments that each correspond to a speech feature cluster identifier

US10902843B2 · kind B2 · utility

0Cited by
13References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 15, 2019
Grant dateJan 26, 2021
Priority date
Expiry dateNov 15, 2039

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L17/18
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Audio features, such as perceptual linear prediction (PLP) features and time derivatives thereof, are extracted from frames of training audio data including speech by multiple speakers, and silence, such as by using linear discriminant analysis (LDA). The frames are clustered into k-means clusters using distance measures, such as Mahalanobis distance measures, of means and variances of the extracted audio features of the frames. A recurrent neural network (RNN) is trained on the extracted audio features of the frames and cluster identifiers of the k-means clusters into which the frames have been clustered. The RNN is applied to audio data to segment audio data into segments that each correspond to one of the cluster identifiers. Each segment can be assigned a label corresponding to one of the cluster identifiers. Speech recognition can be performed on the segments.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.