Patent · US Active

Fully supervised speaker diarization

US11031017B2 · kind B2 · utility

1Cited by

15References

28Claims

0Family size

Assignee

Google LLC · US

Inventors

Chong Wang · Redmond, US
Aonan Zhang · Mountain View, US
Quan Wang · Hoboken, US
Zhenyao Zhu · Mountain View, US

Key dates

Filing date	Jan 8, 2019
Grant date	Jun 8, 2021
Priority date	—
Expiry date	Jun 6, 2039

Classification

Technology area (CPC G)Physics
CPC primaryG10L25/87
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method includes receiving an utterance of speech and segmenting the utterance of speech into a plurality of segments. For each segment of the utterance of speech, the method also includes extracting a speaker-discriminative embedding from the segment and predicting a probability distribution over possible speakers for the segment using a probabilistic generative model configured to receive the extracted speaker-discriminative embedding as a feature input. The probabilistic generative model trained on a corpus of training speech utterances each segmented into a plurality of training segments. Each training segment including a corresponding speaker-discriminative embedding and a corresponding speaker label. The method also includes assigning a speaker label to each segment of the utterance of speech based on the probability distribution over possible speakers for the corresponding segment.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.