Patent · US Active

Speaker diartzation using an end-to-end model

US11545157B2 · kind B2 · utility

2Cited by

0References

15Claims

0Family size

Assignee

Google LLC · US

Inventors

Quan Wang · Hoboken, US
Yash Sheth · Sunnyvale, US
Ignacio Lopez Moreno · New York, US
Li Wan · Beijing, CN

Key dates

Filing date	Apr 15, 2019
Grant date	Jan 3, 2023
Priority date	—
Expiry date	Aug 4, 2039

Classification

Technology area (CPC G)Physics
CPC primaryG10L2021/02165
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Techniques are described for training and/or utilizing an end-to-end speaker diarization model. In various implementations, the model is a recurrent neural network (RNN) model, such as an RNN model that includes at least one memory layer, such as a long short-term memory (LSTM) layer. Audio features of audio data can be applied as input to an end-to-end speaker diarization model trained according to implementations disclosed herein, and the model utilized to process the audio features to generate, as direct output over the model, speaker diarization results. Further, the end-to-end speaker diarization model can be a sequence-to-sequence model, where the sequence can have variable length. Accordingly, the model can be utilized to generate speaker diarization results for any of various length audio segments.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.