Patent · US Expired

Minimizing visibility of stale content in web searching including revising web crawl intervals of documents

US7987172B1 · kind B1 · utility

36Cited by
48References
48Claims
0Family size

Assignee

Inventor

Key dates

Filing dateAug 30, 2004
Grant dateJul 26, 2011
Priority date
Expiry dateFeb 14, 2026

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/951
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method and system is disclosed for associating an appropriate web crawl interval with a document so that the probability of the document's stale content being used by a search engine is below an acceptable level when the search engine crawls the document at its associated web crawl interval. The web crawl interval of a document is determined through an iterative process and updated dynamically by the search engine after every visit to the document by a web crawler. A multi-tier data structure is employed for managing the web crawl order of billions of documents on the Internet. The search engine may move a document from one tier to another if its web crawl interval is changed significantly.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.