Patent · US Active

Contextualized streaming end-to-end speech recognition with trie-based deep biasing and shallow fusion

US12087306B1 · kind B1 · utility

2Cited by
2References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 24, 2021
Grant dateSep 10, 2024
Priority date
Expiry dateJun 30, 2042

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG10L15/183
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

In one embodiment, a method includes receiving a user's utterance comprising a word in a custom vocabulary list of the user, generating a previous token to represent a previous audio portion of the utterance, and generating a current token to represent a current audio portion of the utterance by generating a bias embedding by using the previous token to query a trie of wordpieces representing the custom vocabulary list, generating first probabilities of respective first candidate tokens likely uttered in the current audio portion based on the bias embedding and the current audio portion, generating second probabilities of respective second candidate tokens likely uttered after the previous token based on the previous token and the bias embedding, and generating the current token to represent the current audio portion of the utterance based on the first probabilities of the first candidate tokens and the second probabilities of the second candidate tokens.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.