Code-switching speech recognition with end-to-end connectionist temporal classification model
US10964309B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | May 13, 2019 |
| Grant date | Mar 30, 2021 |
| Priority date | — |
| Expiry date | May 13, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L2015/0635
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A CS CTC model may be initialed from a major language CTC model by keeping network hidden weights and replacing output tokens with a union of major and secondary language output tokens. The initialized model may be trained by updating parameters with training data from both languages, and a LID model may also be trained with the data. During a decoding process for each of a series of audio frames, if silence dominates a current frame then a silence output token may be emitted. If silence does not dominate the frame, then a major language output token posterior vector from the CS CTC model may be multiplied with the LID major language probability to create a probability vector from the major language. A similar step is performed for the secondary language, and the system may emit an output token associated with the highest probability across all tokens from both languages.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.