Patent · US Active

Unified speech representation learning

US11735171B2 · kind B2 · utility

1Cited by

6References

20Claims

0Family size

Assignee

MICROSOFT TECHNOLOGY LICENSING, LLC · US

Inventors

Yao Qian · Dublin, US
Yu WU · Hoherlehmer Straße, CN
Kenichi Kumatani · Sammamish, US
Shujie Liu · Cupertino, US
Furu WEI · Redmond, US
Nanshan Zeng · Bellevue, US
Xuedong Huang · Bellevue, US
Chengyi Wang · Beijing, CN

Key dates

Filing date	May 14, 2021
Grant date	Aug 22, 2023
Priority date	—
Expiry date	Feb 8, 2042

Classification

Technology area (CPC G)Physics
CPC primaryG10L2015/025
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Systems and methods are provided for training a machine learning model to learn speech representations. Labeled speech data or both labeled and unlabeled data sets is applied to a feature extractor of a machine learning model to generate latent speech representations. The latent speech representations are applied to a quantizer to generate quantized latent speech representations and to a transformer context network to generate contextual representations. Each contextual representation included in the contextual representations is aligned with a phoneme label to generate phonetically-aware contextual representations. Quantized latent representations are aligned with phoneme labels to generate phonetically aware latent speech representations. Systems and methods also include randomly replacing a sub-set of the contextual representations with quantized latent speech representations during their alignments to phoneme labels and aligning the phonetically aware latent speech representations to the contextual representations using supervised learning.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.