Predicting word boundaries for on-device batching of end-to-end speech recognition models
US12322383B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 21, 2022 |
| Grant date | Jun 3, 2025 |
| Priority date | — |
| Expiry date | Jun 2, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/09
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method includes receiving a sequence of input audio frames corresponding to an utterance captured by a user device, the utterance including a plurality of words. For each input audio frame, the method includes predicting, using a word boundary detection model configured receive the sequence of input audio frames as input, whether the input audio frame is a word boundary. The method includes batching the input audio frames into a plurality of batches based on the input audio frames predicted as word boundaries, wherein each batch includes a corresponding plurality of batched input audio frames. For each of the plurality of batches, the method includes processing, using a speech recognition model, the corresponding plurality of batched input audio frames in parallel to generate a speech recognition result.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.