Patent · US Active

Simplifying lexicon creation in hybrid duplicate detection and inductive classifier systems

US8713014B1 · kind B1 · utility

9Cited by
12References
23Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMay 14, 2010
Grant dateApr 29, 2014
Priority date
Expiry dateJan 8, 2032

Classification

  • Technology area (CPC H)Electricity
  • CPC primaryH04L51/212
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A classification system includes a signature-based duplicate detector and an inductive classifier that share attribute information. To perform the duplicate detection and the classification, the duplicate detector and inductive classifier are first initialized by generating a lexicon of attributes for the duplicate detector and a classification model for the classifier. To develop a classification model, a training set of documents of known class are used by the classifier to determine the attributes of the documents that are most useful in classifying an unknown document. The model is developed from these attributes. Attribute information containing the attributes determined by the classifier is then passed to the duplicate detector and the duplicate detector uses the attribute information to generate the lexicon of attributes.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.