Patent · US Active

Predicting word boundaries for on-device batching of end-to-end speech recognition models

US12322383B2 · kind B2 · utility

0Cited by
0References
26Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 21, 2022
Grant dateJun 3, 2025
Priority date
Expiry dateJun 2, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06N3/09
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method includes receiving a sequence of input audio frames corresponding to an utterance captured by a user device, the utterance including a plurality of words. For each input audio frame, the method includes predicting, using a word boundary detection model configured receive the sequence of input audio frames as input, whether the input audio frame is a word boundary. The method includes batching the input audio frames into a plurality of batches based on the input audio frames predicted as word boundaries, wherein each batch includes a corresponding plurality of batched input audio frames. For each of the plurality of batches, the method includes processing, using a speech recognition model, the corresponding plurality of batched input audio frames in parallel to generate a speech recognition result.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.