Patent · US Active

Target speaker mode

US12217761B2 · kind B2 · utility

0Cited by
4References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateOct 31, 2021
Grant dateFeb 4, 2025
Priority date
Expiry dateOct 31, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L2021/02087
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media relate to a method for target speaker extraction. A target speaker extraction system receives an audio frame of an audio signal. A multi-speaker detection model analyzes the audio frame to determine whether the audio frame includes only a single-speaker or multiple speakers. When the audio frame includes only a single-speaker, the system inputs the audio frame to a target speaker VAD model to suppress speech in the audio frame from a non-target speaker based on comparing the audio frame to a voiceprint of a target speaker. When the audio frame includes multiple speakers, the system inputs the audio frame to a speech separation model to separate the voice of the target speaker from a voice mixture in the audio frame.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.