Target speaker mode
US12217761B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Oct 31, 2021 |
| Grant date | Feb 4, 2025 |
| Priority date | — |
| Expiry date | Oct 31, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L2021/02087
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media relate to a method for target speaker extraction. A target speaker extraction system receives an audio frame of an audio signal. A multi-speaker detection model analyzes the audio frame to determine whether the audio frame includes only a single-speaker or multiple speakers. When the audio frame includes only a single-speaker, the system inputs the audio frame to a target speaker VAD model to suppress speech in the audio frame from a non-target speaker based on comparing the audio frame to a voiceprint of a target speaker. When the audio frame includes multiple speakers, the system inputs the audio frame to a speech separation model to separate the voice of the target speaker from a voice mixture in the audio frame.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.