Patent · US Active

Using aligned text and speech representations to train automatic speech recognition models without transcribed speech data

US12400638B2 · kind B2 · utility

0Cited by
0References
24Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 20, 2023
Grant dateAug 26, 2025
Priority date
Expiry dateMar 17, 2044

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L15/16
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method includes receiving training data that includes unspoken textual utterances in a target language. Each unspoken textual utterance not paired with any corresponding spoken utterance of non-synthetic speech. The method also includes generating a corresponding alignment output for each unspoken textual utterance using an alignment model trained on transcribed speech utterance in one or more training languages each different than the target language. The method also includes generating a corresponding encoded textual representation for each alignment output using a text encoder and training a speech recognition model on the encoded textual representations generated for the alignment outputs. Training the speech recognition model teaches the speech recognition model to learn how to recognize speech in the target language.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.