Patent · US Active

Automatic classification of audio content as either primarily speech or primarily non-speech, to facilitate dynamic application of dialogue enhancement

US12300259B2 · kind B2 · utility

0Cited by
3References
17Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMar 10, 2022
Grant dateMay 13, 2025
Priority date
Expiry dateJul 20, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L2025/783
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method for dynamically controlling enhancement of an audio stream is provided, where the audio stream defines a sequence of audio segments over time. Each audio segment defines a waveform having a plurality of waveform attributes. For each audio segment of the sequence of audio segments, the method includes: (i) determining a set of waveform-attribute values of the audio segment's waveform attributes, (ii) computing a first distance between the determined set of waveform-attribute values and a first predefined set of waveform-attribute values representative of speech, and computing a second distance between the determined set of waveform-attribute values and a second predefined set of waveform-attribute values representative of music, (iii) using the computed first and second distances as a basis to classify the audio segment as primarily speech or rather primarily music, and (iv) controlling, based on the classifying, whether or not to enhance the audio segment for output.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.