Patent · US Active

Frame-level combination of deep neural network and gaussian mixture models

US9240184B1 · kind B1 · utility

37Cited by
47References
23Claims
0Family size

Assignee

Inventors

Key dates

Filing dateFeb 12, 2013
Grant dateJan 19, 2016
Priority date
Expiry dateDec 5, 2033

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L15/142
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method and system for frame-level merging of HMM state predictions determined by different techniques is disclosed. An audio input signal may be transformed into a first and second sequence of feature vector, the sequences corresponding to each other and to a temporal sequence of frames of the audio input signal on a frame-by-frame basis. The first sequence may be processed by a neural network (NN) to determine NN-based state predictions, and the second sequence may be processed by a Gaussian mixture model (GMM) to determine GMM-based state predictions. The NN-based and GMM-based state predictions may be merged as weighted sums for each of a plurality of HMM state on a frame-by-frame basis to determine merged state predictions. The merged state predictions may then be applied to the HMMs to speech content of the audio input signal.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.