Patent · US Active

Monaural multi-talker speech recognition with attention mechanism and gated convolutional networks

US10699700B2 · kind B2 · utility

2Cited by
1References
17Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 31, 2018
Grant dateJun 30, 2020
Priority date
Expiry dateDec 29, 2038

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L2015/025
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Provided are a speech recognition training processing method and an apparatus including the same. The speech recognition training processing method includes acquiring multi-talker mixed speech sequence data corresponding to a plurality of speakers, encoding the multi-speaker mixed speech sequence data into an embedded sequence data, generating speaker specific context vectors at each frame based on the embedded sequence, generating senone posteriors for each of the speaker based on the speaker specific context vectors and updating an acoustic model by performing permutation invariant training (PIT) model training based on the senone posteriors.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.