Patent · US Active

Speaker attributed transcript generation

US11322148B2 · kind B2 · utility

1Cited by

3References

19Claims

0Family size

Assignee

MICROSOFT TECHNOLOGY LICENSING, LLC · US

Inventors

Takuya Yoshioka · Bellevue, US
Andreas Stolcke · Berkeley, US
Zhuo Chen · Markham, CA
Dimitrios Dimitriadis · Rutherford, US
Nanshan Zeng · Bellevue, US
Lijuan Qin · Redmond, US
William Isaac Hinthorn · Seattle, US
Xuedong Huang · Bellevue, US

Key dates

Filing date	Apr 30, 2019
Grant date	May 3, 2022
Priority date	—
Expiry date	Dec 5, 2039

Classification

Technology area (CPC H)Electricity
CPC primaryH04M2201/41
WIPO fieldDigital communication
WIPO sectorElectrical engineering

Abstract

A computer implemented method processes audio streams recorded during a meeting by a plurality of distributed devices. Operations include performing speech recognition on each audio stream by a corresponding speech recognition system to generate utterance-level posterior probabilities as hypotheses for each audio stream, aligning the hypotheses and formatting them as word confusion networks with associated word-level posteriors probabilities, performing speaker recognition on each audio stream by a speaker identification algorithm that generates a stream of speaker-attributed word hypotheses, formatting speaker hypotheses with associated speaker label posterior probabilities and speaker-attributed hypotheses for each audio stream as a speaker confusion network, aligning the word and speaker confusion networks from all audio streams to each other to merge the posterior probabilities and align word and speaker labels, and creating a best speaker-attributed word transcript by selecting the sequence of word and speaker labels with the highest posterior probabilities.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.