Patent · US Expired

Identifying duplicate documents from search results without comparing document content

US5913208A · kind A · utility

105Cited by

11References

35Claims

0Family size

Assignee

International Business Machines Corporation · US

Inventors

Eric W. Brown · Pittsburgh, US
John M. Prager · Stony Point, US

Key dates

Filing date	Jul 9, 1996
Grant date	Jun 15, 1999
Priority date	—
Expiry date	Jul 9, 2016

Classification

Technology area (CPC Y)Emerging Cross-Sectional Technologies
CPC primaryY10S707/99935
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A computer system has a document collection of one or more documents and one or more indexes that each include an inverted file with one or more terms. Each of the terms is associated with one or more document identifiers. The index further includes a document catalog that associates each of the document identifiers with one or more attributes, either intrinsic or non intrinsic. A search engine process produces a hit list having one or more hit list entries. Each hit list entry, with one or more hit list attributes, is associated with one of the documents that is determined by the search engine to be relevant to the query. A formatter processor selects one or more of the hit list attributes, identified by a hit list attribute selector and then compares the selected attributes of two or more entries on the hit list to determine whether or not documents associated with these entries are duplicate instances of one another. The determination can be made without examining the content of the document associated with the entries.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.