Patent · US Active

Systems and methods for human listening and live captioning

US11922963B2 · kind B2 · utility

0Cited by

0References

22Claims

0Family size

Assignee

MICROSOFT TECHNOLOGY LICENSING, LLC · US

Inventors

Xiaofei Wang · Cedar Grove, US
Sefik Emre ESKIMEZ · Bellevue, US
Min Tang · Boulder, US
Hemin Yang · Bellevue, US
Zirun Zhu · Bellevue, US
Zhuo Chen · Markham, CA
Huaming Wang · Qingdao, CN
Takuya Yoshioka · Bellevue, US

Key dates

Filing date	May 26, 2021
Grant date	Mar 5, 2024
Priority date	—
Expiry date	May 26, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG10L25/51
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Systems and methods are provided for generating and operating a speech enhancement model optimized for generating noise-suppressed speech outputs for improved human listening and live captioning. A computing system obtains a speech enhancement model trained on a first training dataset to generate noise-suppressed speech outputs and an automatic speech recognition model trained on a second training dataset to generate transcription labels for spoken language utterances. A third training dataset comprising a set of spoken language utterances is applied to the speech enhancement model to obtain a first noise-suppressed speech output which is applied to the automatic speech recognition model to generate a noise-suppressed transcription output for the set of spoken language utterances. Speech enhancement model parameters are updated to optimize the speech enhancement model to generate optimized noise-suppressed speech outputs based on a comparison of the noise-suppressed transcription output and ground truth transcription labels.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.