Patent · US Active

Frame-level combination of deep neural network and gaussian mixture models

US9240184B1 · kind B1 · utility

37Cited by

47References

23Claims

0Family size

Assignee

Google LLC · US

Inventors

Hui Lin · Sunnyvale, US
Xin Lei · Nanhu, CN
Vincent O. Vanhoucke · San Francisco, US

Key dates

Filing date	Feb 12, 2013
Grant date	Jan 19, 2016
Priority date	—
Expiry date	Dec 5, 2033

Classification

Technology area (CPC G)Physics
CPC primaryG10L15/142
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method and system for frame-level merging of HMM state predictions determined by different techniques is disclosed. An audio input signal may be transformed into a first and second sequence of feature vector, the sequences corresponding to each other and to a temporal sequence of frames of the audio input signal on a frame-by-frame basis. The first sequence may be processed by a neural network (NN) to determine NN-based state predictions, and the second sequence may be processed by a Gaussian mixture model (GMM) to determine GMM-based state predictions. The NN-based and GMM-based state predictions may be merged as weighted sums for each of a plurality of HMM state on a frame-by-frame basis to determine merged state predictions. The merged state predictions may then be applied to the HMMs to speech content of the audio input signal.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.