Patent · US Expired

Code, system and method for representing a natural-language text in a form suitable for text manipulation

US7386442B2 · kind B2 · utility

60Cited by
28References
26Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 1, 2003
Grant dateJun 10, 2008
Priority date
Expiry dateJan 29, 2026

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99936
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A computer method, system and code, for representing a natural-language document in a vector form suitable for text manipulation operations are disclosed. The method involves determining (a) for each of a plurality of terms selected from one of (i) non-generic words in the document, (ii) proximately arranged word groups in the document, and (iii) a combination of (i) and (ii), a selectivity value of the term related to the frequency of occurrence of that term in a library of texts in one field, relative to the frequency of occurrence of the same term in one or more other libraries of texts in one or more other fields, respectively. The document is represented as a vector of terms, where the coefficient assigned to each term includes a function of the selectivity value determined for that term.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.