Patent · US Active

Automatic classification of audio content as either primarily speech or primarily non-speech, to facilitate dynamic application of dialogue enhancement

US12300259B2 · kind B2 · utility

0Cited by

3References

17Claims

0Family size

Assignee

Roku, Inc. · US

Inventors

David Friedman · Austin, US
Alan Bithell · Sheffield, GB
Robert Caston Curtis · San Jose, US

Key dates

Filing date	Mar 10, 2022
Grant date	May 13, 2025
Priority date	—
Expiry date	Jul 20, 2043

Classification

Technology area (CPC G)Physics
CPC primaryG10L2025/783
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method for dynamically controlling enhancement of an audio stream is provided, where the audio stream defines a sequence of audio segments over time. Each audio segment defines a waveform having a plurality of waveform attributes. For each audio segment of the sequence of audio segments, the method includes: (i) determining a set of waveform-attribute values of the audio segment's waveform attributes, (ii) computing a first distance between the determined set of waveform-attribute values and a first predefined set of waveform-attribute values representative of speech, and computing a second distance between the determined set of waveform-attribute values and a second predefined set of waveform-attribute values representative of music, (iii) using the computed first and second distances as a basis to classify the audio segment as primarily speech or rather primarily music, and (iv) controlling, based on the classifying, whether or not to enhance the audio segment for output.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.