Patent · US Active

Fully supervised speaker diarization

US11031017B2 · kind B2 · utility

1Cited by
15References
28Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJan 8, 2019
Grant dateJun 8, 2021
Priority date
Expiry dateJun 6, 2039

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L25/87
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method includes receiving an utterance of speech and segmenting the utterance of speech into a plurality of segments. For each segment of the utterance of speech, the method also includes extracting a speaker-discriminative embedding from the segment and predicting a probability distribution over possible speakers for the segment using a probabilistic generative model configured to receive the extracted speaker-discriminative embedding as a feature input. The probabilistic generative model trained on a corpus of training speech utterances each segmented into a plurality of training segments. Each training segment including a corresponding speaker-discriminative embedding and a corresponding speaker label. The method also includes assigning a speaker label to each segment of the utterance of speech based on the probability distribution over possible speakers for the corresponding segment.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.