Automatic classification of audio content as either primarily speech or primarily non-speech, to facilitate dynamic application of dialogue enhancement
US12300259B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 10, 2022 |
| Grant date | May 13, 2025 |
| Priority date | — |
| Expiry date | Jul 20, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L2025/783
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for dynamically controlling enhancement of an audio stream is provided, where the audio stream defines a sequence of audio segments over time. Each audio segment defines a waveform having a plurality of waveform attributes. For each audio segment of the sequence of audio segments, the method includes: (i) determining a set of waveform-attribute values of the audio segment's waveform attributes, (ii) computing a first distance between the determined set of waveform-attribute values and a first predefined set of waveform-attribute values representative of speech, and computing a second distance between the determined set of waveform-attribute values and a second predefined set of waveform-attribute values representative of music, (iii) using the computed first and second distances as a basis to classify the audio segment as primarily speech or rather primarily music, and (iv) controlling, based on the classifying, whether or not to enhance the audio segment for output.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.