Contextualized streaming end-to-end speech recognition with trie-based deep biasing and shallow fusion
US12087306B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 24, 2021 |
| Grant date | Sep 10, 2024 |
| Priority date | — |
| Expiry date | Jun 30, 2042 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG10L15/183
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
In one embodiment, a method includes receiving a user's utterance comprising a word in a custom vocabulary list of the user, generating a previous token to represent a previous audio portion of the utterance, and generating a current token to represent a current audio portion of the utterance by generating a bias embedding by using the previous token to query a trie of wordpieces representing the custom vocabulary list, generating first probabilities of respective first candidate tokens likely uttered in the current audio portion based on the bias embedding and the current audio portion, generating second probabilities of respective second candidate tokens likely uttered after the previous token based on the previous token and the bias embedding, and generating the current token to represent the current audio portion of the utterance based on the first probabilities of the first candidate tokens and the second probabilities of the second candidate tokens.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.