Using aligned text and speech representations to train automatic speech recognition models without transcribed speech data
US12400638B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 20, 2023 |
| Grant date | Aug 26, 2025 |
| Priority date | — |
| Expiry date | Mar 17, 2044 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L15/16
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method includes receiving training data that includes unspoken textual utterances in a target language. Each unspoken textual utterance not paired with any corresponding spoken utterance of non-synthetic speech. The method also includes generating a corresponding alignment output for each unspoken textual utterance using an alignment model trained on transcribed speech utterance in one or more training languages each different than the target language. The method also includes generating a corresponding encoded textual representation for each alignment output using a text encoder and training a speech recognition model on the encoded textual representations generated for the alignment outputs. Training the speech recognition model teaches the speech recognition model to learn how to recognize speech in the target language.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.