Identifying duplicate documents from search results without comparing document content
US5913208A · kind A · utility
Assignee
Inventors
Key dates
| Filing date | Jul 9, 1996 |
| Grant date | Jun 15, 1999 |
| Priority date | — |
| Expiry date | Jul 9, 2016 |
Classification
- Technology area (CPC Y)Emerging Cross-Sectional Technologies
- CPC primaryY10S707/99935
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A computer system has a document collection of one or more documents and one or more indexes that each include an inverted file with one or more terms. Each of the terms is associated with one or more document identifiers. The index further includes a document catalog that associates each of the document identifiers with one or more attributes, either intrinsic or non intrinsic. A search engine process produces a hit list having one or more hit list entries. Each hit list entry, with one or more hit list attributes, is associated with one of the documents that is determined by the search engine to be relevant to the query. A formatter processor selects one or more of the hit list attributes, identified by a hit list attribute selector and then compares the selected attributes of two or more entries on the hit list to determine whether or not documents associated with these entries are duplicate instances of one another. The determination can be made without examining the content of the document associated with the entries.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.