Patent · US Active

Fully supervised speaker diarization

US11688404B2 · kind B2 · utility

2Cited by
15References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMay 26, 2021
Grant dateJun 27, 2023
Priority date
Expiry dateDec 24, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L25/87
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method includes receiving an utterance of speech and segmenting the utterance of speech into a plurality of segments. For each segment of the utterance of speech, the method also includes extracting a speaker=discriminative embedding from the segment and predicting a probability distribution over possible speakers for the segment using a probabilistic generative model configured to receive the extracted speaker-discriminative embedding as a feature input. The probabilistic generative model trained on a corpus of training speech utterances each segmented into a plurality of training segments. Each training segment including a corresponding speaker-discriminative embedding and a corresponding speaker label. The method also includes assigning a speaker label to each segment of the utterance of speech based on the probability distribution over possible speakers for the corresponding segment.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.