Patent · US Active

Generating representations of speech signals using self-supervised learning

US11551668B1 · kind B1 · utility

8Cited by

5References

19Claims

0Family size

Assignee

Meta Platforms, Inc. · US

Inventors

Alexei Baevski · Redwood City, US
Yuhao Zhou · New York, US
Abdelrahman S. A. Mohamed · Redmond, US
Michael Auli · Menlo Park, US
Ronan Collobert · Mountain View, US
Alexis Conneau · Foster City, US

Key dates

Filing date	Dec 30, 2020
Grant date	Jan 10, 2023
Priority date	—
Expiry date	Jun 9, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG10L15/26
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

In one embodiment, a method includes generating audio segments from a speech signal, generating latent representations that respectively correspond to the audio segments, the latent representations comprising a first subset and a second subset, generating quantized representations that respectively correspond to the latent representations, masking the second subset of the latent representations, using a machine-learning model to process the first subset of the latent representations and the masked second subset of the latent representations to generate contextualized representations that respectively correspond to the latent representations, pre-training the machine-learning model based on comparisons between (1) a subset of the contextualized representations that respectively correspond to the masked second subset of the latent representations and (2) a subset of the quantized representations that respectively correspond to the masked second subset of the latent representations, and training the pre-trained machine-learning model to perform a speech analysis task.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.