Code, system and method for representing a natural-language text in a form suitable for text manipulation
US7386442B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 1, 2003 |
| Grant date | Jun 10, 2008 |
| Priority date | — |
| Expiry date | Jan 29, 2026 |
Classification
- Technology area (CPC Y)Emerging Cross-Sectional Technologies
- CPC primaryY10S707/99936
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A computer method, system and code, for representing a natural-language document in a vector form suitable for text manipulation operations are disclosed. The method involves determining (a) for each of a plurality of terms selected from one of (i) non-generic words in the document, (ii) proximately arranged word groups in the document, and (iii) a combination of (i) and (ii), a selectivity value of the term related to the frequency of occurrence of that term in a library of texts in one field, relative to the frequency of occurrence of the same term in one or more other libraries of texts in one or more other fields, respectively. The document is represented as a vector of terms, where the coefficient assigned to each term includes a function of the selectivity value determined for that term.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.