Patent · US Active

Fully supervised speaker diarization

US11688404B2 · kind B2 · utility

2Cited by

15References

20Claims

0Family size

Assignee

Google LLC · US

Inventors

Chong Wang · Redmond, US
Aonan Zhang · Mountain View, US
Quan Wang · Hoboken, US
Zhenyao Zhu · Mountain View, US

Key dates

Filing date	May 26, 2021
Grant date	Jun 27, 2023
Priority date	—
Expiry date	Dec 24, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG10L25/87
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method includes receiving an utterance of speech and segmenting the utterance of speech into a plurality of segments. For each segment of the utterance of speech, the method also includes extracting a speaker=discriminative embedding from the segment and predicting a probability distribution over possible speakers for the segment using a probabilistic generative model configured to receive the extracted speaker-discriminative embedding as a feature input. The probabilistic generative model trained on a corpus of training speech utterances each segmented into a plurality of training segments. Each training segment including a corresponding speaker-discriminative embedding and a corresponding speaker label. The method also includes assigning a speaker label to each segment of the utterance of speech based on the probability distribution over possible speakers for the corresponding segment.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.