Patent · US Expired

Information retrieval systems with duplicate document detection and presentation functions

US7809695B2 · kind B2 · utility

12Cited by
12References
39Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMay 5, 2005
Grant dateOct 5, 2010
Priority date
Expiry dateJul 20, 2025

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/951
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Many companies provide online search facilities that enable users to conduct computerized searches for documents. Unfortunately, these searches frequently provide results that include duplicate documents—that is, documents that are completely or substantially identical to each other. This problem is particularly vexing when searching news stories, for example. Moreover, the duplicate documents are intermixed in the search results, leaving users to manually manage the complexities of identifying and/or filtering them. Accordingly, the present inventors devised systems, methods, and software that facilitate the identification and/or grouping of duplicate documents in search results. One exemplary system includes a signature generation module which generates document signatures based on length, temporal, and/or content components; a real-time duplicate detection module which uses the document signatures to identify “exact” or “fuzzy” duplicate documents; and a user-interface or presentation module which controls how duplicate documents are presented or suppressed in search results.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.