Patent · US Active

Detecting duplicate documents using classification

US8180773B2 · kind B2 · utility

0Cited by
6References
21Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMay 27, 2009
Grant dateMay 15, 2012
Priority date
Expiry dateMar 11, 2030

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/355
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Systems, methods and articles of manufacture are disclosed for detecting a duplicate document. A plurality of documents may be assigned to categories, each category corresponding to a collection of duplicates, or near duplicate documents. A new document may be received. The new document may be evaluated against each category to determine a similarity score between the new document and each category. The new document may be identified as a duplicate based on the similarity scores and thresholds for each category. An action may then be performed on the duplicate based on duplication rules. The thresholds and duplication rules may be customized by a user.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.