Patent · US Active

Using aligned text and speech representations to train automatic speech recognition models without transcribed speech data

US12400638B2 · kind B2 · utility

0Cited by

0References

24Claims

0Family size

Assignee

Google LLC · US

Inventors

Andrew Rosenberg · Brooklyn, US
Zhehuai Chen · Edgewater, US
Ankur Bapna · Sunnyvale, US
Yu Zhang · Mountain View, US
Bhuvana Ramabhadran · Campion Road, US

Key dates

Filing date	Jul 20, 2023
Grant date	Aug 26, 2025
Priority date	—
Expiry date	Mar 17, 2044

Classification

Technology area (CPC G)Physics
CPC primaryG10L15/16
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method includes receiving training data that includes unspoken textual utterances in a target language. Each unspoken textual utterance not paired with any corresponding spoken utterance of non-synthetic speech. The method also includes generating a corresponding alignment output for each unspoken textual utterance using an alignment model trained on transcribed speech utterance in one or more training languages each different than the target language. The method also includes generating a corresponding encoded textual representation for each alignment output using a text encoder and training a speech recognition model on the encoded textual representations generated for the alignment outputs. Training the speech recognition model teaches the speech recognition model to learn how to recognize speech in the target language.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.