System for automatically managing duplicate documents when crawling dynamic documents
US7680773B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 31, 2005 |
| Grant date | Mar 16, 2010 |
| Priority date | — |
| Expiry date | Mar 27, 2026 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/951
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system of reducing the possibility of crawling duplicate document identifiers partitions a plurality of document identifiers into multiple clusters, each cluster having a cluster name and a set of document parameters. The system generates an equivalence rule for each cluster of document identifiers, the rule specifying which document parameters associated with the cluster are content-relevant. Next, the system groups each cluster of document identifiers into one or more equivalence classes in accordance with its associated equivalence rule, each equivalence class including one or more document identifiers that correspond to a document content and having a representative document identifier identifying the document content.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.