Detecting duplicate and near-duplicate files
US6658423B1 · kind B1 · utility
549Cited by
6References
38Claims
0Family size
Assignee
Inventors
Key dates
| Filing date | Jan 24, 2001 |
| Grant date | Dec 2, 2003 |
| Priority date | — |
| Expiry date | Jan 6, 2022 |
Classification
- Technology area (CPC Y)Emerging Cross-Sectional Technologies
- CPC primaryY10S707/99943
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Improved duplicate and near-duplicate detection techniques may assign a number of fingerprints to a given document by (i) extracting parts from the document, (ii) assigning the extracted parts to one or more of a predetermined number of lists, and (iii) generating a fingerprint from each of the populated lists. Two documents may be considered to be near-duplicates if any one of their fingerprints match.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.