Array geometry agnostic multi-channel personalized speech enhancement
US12230259B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 17, 2021 |
| Grant date | Feb 18, 2025 |
| Priority date | — |
| Expiry date | Apr 7, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L2021/02087
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Examples of array geometry agnostic multi-channel personalized speech enhancement (PSE) extract speaker embeddings, which represent acoustic characteristics of one or more target speakers, from target speaker enrollment data. Spatial features (e.g., inter-channel phase difference) are extracted from input audio captured by a microphone array. The input audio includes a mixture of speech data of the target speaker(s) and one or more interfering speaker(s). The input audio, the extracted speaker embeddings, and the extracted spatial features are provided to a trained geometry-agnostic PSE model. Output data is produced, which comprises estimated clean speech data of the target speaker(s) that has a reduction (or elimination) of speech data of the interfering speaker(s), without the trained PSE model requiring geometry information for the microphone array.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.