Patent · US Active

Interactive cleaning for automatic document clustering and categorization

US7711747B2 · kind B2 · utility

38Cited by
6References
23Claims
0Family size

Assignee

Inventors

Key dates

Filing dateApr 6, 2007
Grant dateMay 4, 2010
Priority date
Expiry dateMay 13, 2028

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/953
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Documents are clustered or categorized to generate a model associating documents with classes. Outlier measures are computed for the documents indicative of how well each document fits into the model. Outlier documents are identified to a user based on the outlier measures and a user selected outlier criterion. Ambiguity measures are computed for the documents indicative of a number of classes with which each document has similarity under the model. If a document is annotated with a label class, a possible corrective label class is identified if the annotated document has higher similarity with the possible corrective label class under the model than with the annotated label class. The clustering or categorizing is repeated adjusted based on received user input to generate an updated model associating documents with classes. Outlier and ambiguity measures are also calculated at runtime for new documents classified using the model.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.