Patent · US Expired

Customized tokenization of domain specific text via rules corresponding to a speech recognition vocabulary

US6327561A · kind A · utility

69Cited by
8References
17Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 7, 1999
Grant dateDec 4, 2001
Priority date
Expiry dateJul 7, 2019

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/284
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method for supporting customized tokenization of domain-specific text acomprises the steps of: loading domain-specific tokenization rules corresponding to the customized tokenization of the domain-specific text; tokenizing the domain-specific text using the loaded domain-specific tokenization rules; and, further tokenizing the domain-specific text using general purpose tokenization rules. The loading step of the inventive method can comprise: loading a speech recognition vocabulary; and, loading domain-specific tokenization rules corresponding to the speech recognition vocabulary. In addition, the tokenizing step can comprise identifying each substring in the domain-specific text matching a regular expression having a corresponding replacement pattern in the loaded domain-specific tokenization rules, and replacing each substring identified in the identifying step with the replacement pattern corresponding to the matched regular expression. Alternatively, the tokenizing step can comprise identifying substrings in the domain-specific text matching a regular expression having a corresponding replacement pattern in the second loaded domain-specific tokenization rules; excluding from …

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.