Patent · US Active

Target speaker mode

US12217761B2 · kind B2 · utility

0Cited by

4References

20Claims

0Family size

Assignee

Zoom Video Communications, Inc. · US

Inventors

Yuhui Chen · Tangxia, CN
Qiyong Liu · Hangzhou City, CN
Zhengwei Wei · Qiaonanxiang, CN
Yangbin Zeng · Zhejiang, CN

Key dates

Filing date	Oct 31, 2021
Grant date	Feb 4, 2025
Priority date	—
Expiry date	Oct 31, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG10L2021/02087
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media relate to a method for target speaker extraction. A target speaker extraction system receives an audio frame of an audio signal. A multi-speaker detection model analyzes the audio frame to determine whether the audio frame includes only a single-speaker or multiple speakers. When the audio frame includes only a single-speaker, the system inputs the audio frame to a target speaker VAD model to suppress speech in the audio frame from a non-target speaker based on comparing the audio frame to a voiceprint of a target speaker. When the audio frame includes multiple speakers, the system inputs the audio frame to a speech separation model to separate the voice of the target speaker from a voice mixture in the audio frame.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.