Supervised and unsupervised training with contrastive loss over sequences
US12230249B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 22, 2022 |
| Grant date | Feb 18, 2025 |
| Priority date | — |
| Expiry date | Jul 11, 2043 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L2015/0635
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method includes receiving audio data corresponding to an utterance and generating a pair of positive audio data examples. Here, each positive audio data example includes a respective augmented copy of the received audio data. For each respective positive audio data example, the method includes generating a respective sequence of encoder outputs and projecting the respective sequence of encoder outputs for the positive data example into a contrastive loss space. The method also includes determining a L2 distance between each corresponding encoder output in the projected sequences of encoder outputs for the positive audio data examples and determining a per-utterance consistency loss by averaging the L2 distances. The method also includes generating corresponding speech recognition results for each respective positive audio data example. The method also includes updating parameters of the speech recognition model based on a respective supervised loss term and the per-utterance consistency loss.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.