Minimizing visibility of stale content in web searching including revising web crawl intervals of documents
US7987172B1 · kind B1 · utility
Assignee
Inventor
Key dates
| Filing date | Aug 30, 2004 |
| Grant date | Jul 26, 2011 |
| Priority date | — |
| Expiry date | Feb 14, 2026 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/951
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method and system is disclosed for associating an appropriate web crawl interval with a document so that the probability of the document's stale content being used by a search engine is below an acceptable level when the search engine crawls the document at its associated web crawl interval. The web crawl interval of a document is determined through an iterative process and updated dynamically by the search engine after every visit to the document by a web crawler. A multi-tier data structure is employed for managing the web crawl order of billions of documents on the Internet. The search engine may move a document from one tier to another if its web crawl interval is changed significantly.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.