Patent · US Active

Creating a terms dictionary with named entities or terminologies included in text data

US8538745B2 · kind B2 · utility

3Cited by
1References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJan 4, 2010
Grant dateSep 17, 2013
Priority date
Expiry dateJul 20, 2032

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/295
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A computer system of an embodiment of the disclosure can be used to automatically create or populate a terms dictionary using a set of computing units. A morphological analysis unit can acquire token sequence data by performing morphological analysis for the text data. A category distinguishing unit can distinguish tokens of the token sequence data by using a category dictionary to extract uncategorized words. An uncategorized-word comparing unit can compare each of the extracted uncategorized words with an uncategorized-word comparison rule to extract an uncategorized word matching the uncategorized-word comparison rule as a registration candidate word. A token-sequence comparing unit can compare a token sequence of the token sequence data with a token-sequence comparison rule to extract a token sequence matching the token-sequence comparison rule as registration candidate words. A permission unit can permit a user to select whether to register the registration candidate words in the category dictionary.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.