Speech/music discrimination
US9613640B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 14, 2016 |
| Grant date | Apr 4, 2017 |
| Priority date | — |
| Expiry date | Jan 14, 2036 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L25/21
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A speech/music discrimination method evaluates the standard deviation between envelope peaks, loudness ratio, and smoothed energy difference. The envelope is searched for peaks above a threshold. The standard deviations of the separations between peaks are calculated. Decreased standard deviation is indicative of speech, higher standard deviation is indicative of non-speech. The ratio between minimum and maximum loudness in recent input signal data frames is calculated. If this ratio corresponds to the dynamic range characteristic of speech, it is another indication that the input signal is speech content. Smoothed energies of the frames from the left and right input channels are computed and compared. Similar (e.g., highly correlated) left and right channel smoothed energies is indicative of speech. Dissimilar (e.g., un-correlated content) left and right channel smoothed energies is indicative of non-speech material. The results of the three tests are compared to make a speech/music decision.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.