Customized tokenization of domain specific text via rules corresponding to a speech recognition vocabulary
US6327561A · kind A · utility
Assignee
Inventors
Key dates
| Filing date | Jul 7, 1999 |
| Grant date | Dec 4, 2001 |
| Priority date | — |
| Expiry date | Jul 7, 2019 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/284
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for supporting customized tokenization of domain-specific text acomprises the steps of: loading domain-specific tokenization rules corresponding to the customized tokenization of the domain-specific text; tokenizing the domain-specific text using the loaded domain-specific tokenization rules; and, further tokenizing the domain-specific text using general purpose tokenization rules. The loading step of the inventive method can comprise: loading a speech recognition vocabulary; and, loading domain-specific tokenization rules corresponding to the speech recognition vocabulary. In addition, the tokenizing step can comprise identifying each substring in the domain-specific text matching a regular expression having a corresponding replacement pattern in the loaded domain-specific tokenization rules, and replacing each substring identified in the identifying step with the replacement pattern corresponding to the matched regular expression. Alternatively, the tokenizing step can comprise identifying substrings in the domain-specific text matching a regular expression having a corresponding replacement pattern in the second loaded domain-specific tokenization rules; excluding from …
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.