Patent · US Active

Minimizing visibility of stale content in web searching including revising web crawl intervals of documents

US8407204B2 · kind B2 · utility

11Cited by
62References
45Claims
0Family size

Assignee

Inventor

Key dates

Filing dateJun 22, 2011
Grant dateMar 26, 2013
Priority date
Expiry dateJun 22, 2031

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/951
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method and system is disclosed for associating an appropriate web crawl interval with a document so that the probability of the document's stale content being used by a search engine is below an acceptable level when the search engine crawls the document at its associated web crawl interval. The web crawl interval of a document is determined through an iterative process and updated dynamically by the search engine after every visit to the document by a web crawler. A multi-tier data structure is employed for managing the web crawl order of billions of documents on the Internet. The search engine may move a document from one tier to another if its web crawl interval is changed significantly.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.