Patent · US Active

Duplicate entry detection system and method

US8046372B1 · kind B1 · utility

32Cited by
28References
38Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMay 25, 2007
Grant dateOct 25, 2011
Priority date
Expiry dateJun 6, 2029

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/313
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A computer system and method for determining whether the subject matter described in a received document is substantially similar to the subject matter of other documents in a document corpus, such that the received document can be considered a duplicate document. After receiving a first document, a set of tokens for the first document is generated. A non-fielded relevance search on a token index is executed. The relevance search returns a set of candidate duplicate documents with scores corresponding to each candidate document. For each candidate document with a score above a threshold, filtering is performed on each candidate document to determine whether each candidate document is a true duplicate of the first document. A set of candidate documents with a score above the threshold that were not disqualified as candidate documents is then provided.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.