Detecting duplicate documents using classification
US8180773B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | May 27, 2009 |
| Grant date | May 15, 2012 |
| Priority date | — |
| Expiry date | Mar 11, 2030 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/355
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Systems, methods and articles of manufacture are disclosed for detecting a duplicate document. A plurality of documents may be assigned to categories, each category corresponding to a collection of duplicates, or near duplicate documents. A new document may be received. The new document may be evaluated against each category to determine a similarity score between the new document and each category. The new document may be identified as a duplicate based on the similarity scores and thresholds for each category. An action may then be performed on the duplicate based on duplication rules. The thresholds and duplication rules may be customized by a user.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.