Metadata-based diarization of teleconferences
US12217760B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 30, 2022 |
| Grant date | Feb 4, 2025 |
| Priority date | — |
| Expiry date | Dec 29, 2042 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L21/028
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for audio processing includes receiving a recording of a teleconference among multiple participants over a network, including an audio stream containing speech uttered by the participants and information outside the audio stream. The method further includes processing the audio stream to identify speech segments interspersed with intervals of silence, extracting speaker identifications from the information outside the audio stream in the received recording, labeling a first set of the identified speech segments from the audio stream with the speaker identifications, extracting acoustic features from the speech segments in the first set, learning a correlation between the speaker identifications labelled to the segments in the first set and the extracted acoustic features, and labeling a second set of the identified speech segments using the learned correlation, to indicate the participants who spoke during the speech segments in the second set.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.