Patent · US Expired

System and method for automatically classifying text

US7028250B2 · kind B2 · utility

62Cited by
49References
11Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMay 25, 2001
Grant dateApr 11, 2006
Priority date
Expiry dateJun 1, 2023

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/20
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method is provided for automatically classifying text into categories. In operation, a plurality of tokens or features are manually or automatically associated with each category. A weight is then coupled to each feature, wherein the weight indicates a degree of association between the feature and the category. Next, a document is parsed into a plurality of unique tokens with associated counts, wherein the counts are indicative of the number of times the feature appears in the document. A category score representative of a sum of products of each feature count in the document times the corresponding feature weight in the category for each document is then computed. Next, the category scores are sorted by perspective, and a document is classified into a particular category, provided the category score exceeds a predetermined threshold.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.