Patent · US Active

Hypothesis stitcher for speech recognition of long-form audio

US11574639B2 · kind B2 · utility

0Cited by

0References

20Claims

0Family size

Assignee

MICROSOFT TECHNOLOGY LICENSING, LLC · US

Inventors

Naoyuki KANDA · Bellevue, US
Xuankai Chang · Baltimore, US
Yashesh GAUR · Redmond, US
Xiaofei Wang · Cedar Grove, US
Zhong Meng · Seattle, US
Takuya Yoshioka · Bellevue, US

Key dates

Filing date	Dec 18, 2020
Grant date	Feb 7, 2023
Priority date	—
Expiry date	May 13, 2041

Classification

Technology area (CPC G)Physics
CPC primaryG10L21/0272
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A hypothesis stitcher for speech recognition of long-form audio provides superior performance, such as higher accuracy and reduced computational cost. An example disclosed operation includes: segmenting the audio stream into a plurality of audio segments; identifying a plurality of speakers within each of the plurality of audio segments; performing automatic speech recognition (ASR) on each of the plurality of audio segments to generate a plurality of short-segment hypotheses; merging at least a portion of the short-segment hypotheses into a first merged hypothesis set; inserting stitching symbols into the first merged hypothesis set, the stitching symbols including a window change (WC) symbol; and consolidating, with a network-based hypothesis stitcher, the first merged hypothesis set into a first consolidated hypothesis. Multiple variations are disclosed, including alignment-based stitchers and serialized stitchers, which may operate as speaker-specific stitchers or multi-speaker stitchers, and may further support multiple options for differing hypothesis configurations.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.