Creating a terms dictionary with named entities or terminologies included in text data
US8538745B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 4, 2010 |
| Grant date | Sep 17, 2013 |
| Priority date | — |
| Expiry date | Jul 20, 2032 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/295
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A computer system of an embodiment of the disclosure can be used to automatically create or populate a terms dictionary using a set of computing units. A morphological analysis unit can acquire token sequence data by performing morphological analysis for the text data. A category distinguishing unit can distinguish tokens of the token sequence data by using a category dictionary to extract uncategorized words. An uncategorized-word comparing unit can compare each of the extracted uncategorized words with an uncategorized-word comparison rule to extract an uncategorized word matching the uncategorized-word comparison rule as a registration candidate word. A token-sequence comparing unit can compare a token sequence of the token sequence data with a token-sequence comparison rule to extract a token sequence matching the token-sequence comparison rule as registration candidate words. A permission unit can permit a user to select whether to register the registration candidate words in the category dictionary.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.