Method and apparatus for extracting speech related facial features for use in speech recognition systems
US5771306A · kind A · utility
Assignee
Inventors
Key dates
| Filing date | Oct 22, 1993 |
| Grant date | Jun 23, 1998 |
| Priority date | — |
| Expiry date | Oct 22, 2013 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L25/18
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
The apparatus for the recognition of speech comprises an acoustic preprocessor, a visual preprocessor, and a speech classifier that operates the acoustic and visual preprocessed data. The acoustic preprocessor comprises a log mel spectrum analyzer that produces an equal mel bandwidth log power spectrum. The visual processor detects the motion of a set of fiducial markers on the speaker's face and extracts a set of normalized distance vectors describing lip and mouth movement. The speech classifier uses a multilevel time-delay neural network operating on the preprocessed acoustic and visual data to form an output probability distribution that indicates the probability of each candidate utterance having been spoken, based on the acoustic and visual data.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.