Optimizing personal VAD for on-device speech recognition
US12347438B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 17, 2023 |
| Grant date | Jul 1, 2025 |
| Priority date | — |
| Expiry date | Jan 12, 2044 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L17/18
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A computer-implemented method includes receiving a sequence of acoustic frames corresponding to an utterance and generating a reference speaker embedding for the utterance. The method also includes receiving a target speaker embedding for a target speaker and generating feature-wise linear modulation (FiLM) parameters including a scaling vector and a shifting vector based on the target speaker embedding. The method also includes generating an affine transformation output that scales and shifts the reference speaker embedding based on the FiLM parameters. The method also includes generating a classification output indicating whether the utterance was spoken by the target speaker based on the affine transformation output.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.