Patent · US Active

End-to-end speech diarization via iterative speaker embedding

US11887623B2 · kind B2 · utility

0Cited by

0References

26Claims

0Family size

Assignee

Google LLC · US

Inventors

David Grangier · Kirkland, US
Neil Zeghidour · Paris, FR
Oliver Teboul · Mountain View, US

Key dates

Filing date	Jun 22, 2021
Grant date	Jan 30, 2024
Priority date	—
Expiry date	Jul 1, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG10L19/008
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.