Patent · US Active

Training and using a transcript generation model on a multi-speaker audio stream

US11984127B2 · kind B2 · utility

0Cited by
0References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateDec 31, 2021
Grant dateMay 14, 2024
Priority date
Expiry dateNov 13, 2042

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L17/00
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

The disclosure herein describes using a transcript generation model for generating a transcript from a multi-speaker audio stream. Audio data including overlapping speech of a plurality of speakers is obtained and a set of frame embeddings are generated from audio data frames of the obtained audio data using an audio data encoder. A set of words and channel change (CC) symbols are generated from the set of frame embeddings using a transcript generation model. The CC symbols are included between pairs of adjacent words that are spoken by different people at the same time. The set of words and CC symbols are transformed into a plurality of transcript lines, wherein words of the set of words are sorted into transcript lines based on the CC symbols, and a multi-speaker transcript is generated based on the plurality of transcript lines. The inclusion of CC symbols by the model enables efficient, accurate multi-speaker transcription.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.