Patent · US Active

Monaural multi-talker speech recognition with attention mechanism and gated convolutional networks

US10699700B2 · kind B2 · utility

2Cited by

1References

17Claims

0Family size

Assignee

TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED · CN

Inventors

Yanmin QIAN · Shanghai, CN
Dong YU · Zhejiang, CN

Key dates

Filing date	Jul 31, 2018
Grant date	Jun 30, 2020
Priority date	—
Expiry date	Dec 29, 2038

Classification

Technology area (CPC G)Physics
CPC primaryG10L2015/025
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Provided are a speech recognition training processing method and an apparatus including the same. The speech recognition training processing method includes acquiring multi-talker mixed speech sequence data corresponding to a plurality of speakers, encoding the multi-speaker mixed speech sequence data into an embedded sequence data, generating speaker specific context vectors at each frame based on the embedded sequence, generating senone posteriors for each of the speaker based on the speaker specific context vectors and updating an acoustic model by performing permutation invariant training (PIT) model training based on the senone posteriors.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.