Patent · US Expired

Methods and apparatuses for segmenting an audio-visual recording using image similarity searching and audio speaker recognition

US6404925B1 · kind B1 · utility

218Cited by

8References

19Claims

0Family size

Assignees

FUJI XEROX CO., LTD. · JP
Xerox Corporation · US

Inventors

Jonathan Foote · Menlo Park, US
Lynn D. Wilcox · Palo Alto, US

Key dates

Filing date	Mar 11, 1999
Grant date	Jun 11, 2002
Priority date	—
Expiry date	Mar 11, 2019

Classification

Technology area (CPC Y)Emerging Cross-Sectional Technologies
CPC primaryY10S707/99933
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Methods for segmenting audio-video recording of meetings containing slide presentations by one or more speakers are described. These segments serve as indexes into the recorded meeting. If an agenda is provided for the meeting, these segments can be labeled using information from the agenda. The system automatically detects intervals of video that correspond to presentation slides. Under the assumption that only one person is speaking during an interval when slides are displayed in the video, possible speaker intervals are extracted from the audio soundtrack by finding these regions. Since the same speaker may talk across multiple slide intervals, the acoustic data from these intervals is clustered to yield an estimate of the number of distinct speakers and their order. Clustering the audio data from these intervals yields an estimate of the number of different speakers and their order. Merged clustered audio intervals corresponding to a single speaker are then used as training data for a speaker segmentation system. Using speaker identification techniques, the full video is then segmented into individual presentations based on the extent of each presenter's speech. The speaker ide…

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.