Frame-level combination of deep neural network and gaussian mixture models
US9240184B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Feb 12, 2013 |
| Grant date | Jan 19, 2016 |
| Priority date | — |
| Expiry date | Dec 5, 2033 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L15/142
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method and system for frame-level merging of HMM state predictions determined by different techniques is disclosed. An audio input signal may be transformed into a first and second sequence of feature vector, the sequences corresponding to each other and to a temporal sequence of frames of the audio input signal on a frame-by-frame basis. The first sequence may be processed by a neural network (NN) to determine NN-based state predictions, and the second sequence may be processed by a Gaussian mixture model (GMM) to determine GMM-based state predictions. The NN-based and GMM-based state predictions may be merged as weighted sums for each of a plurality of HMM state on a frame-by-frame basis to determine merged state predictions. The merged state predictions may then be applied to the HMMs to speech content of the audio input signal.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.