Fully supervised speaker diarization
US11031017B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 8, 2019 |
| Grant date | Jun 8, 2021 |
| Priority date | — |
| Expiry date | Jun 6, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L25/87
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method includes receiving an utterance of speech and segmenting the utterance of speech into a plurality of segments. For each segment of the utterance of speech, the method also includes extracting a speaker-discriminative embedding from the segment and predicting a probability distribution over possible speakers for the segment using a probabilistic generative model configured to receive the extracted speaker-discriminative embedding as a feature input. The probabilistic generative model trained on a corpus of training speech utterances each segmented into a plurality of training segments. Each training segment including a corresponding speaker-discriminative embedding and a corresponding speaker label. The method also includes assigning a speaker label to each segment of the utterance of speech based on the probability distribution over possible speakers for the corresponding segment.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.