Patent · US Active

System and method for crawl ordering by search impact

US7899807B2 · kind B2 · utility

5Cited by
5References
24Claims
0Family size

Assignee

Inventors

Key dates

Filing dateDec 20, 2007
Grant dateMar 1, 2011
Priority date
Expiry dateMar 20, 2029

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/951
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

An improved system and method for crawl ordering of a web crawler by impact upon search results of a search engine is provided. Content-independent features of uncrawled web pages may be obtained, and the impact of uncrawled web pages may be estimated for queries of a workload using the content-independent features. The impact of uncrawled web pages may be estimated for queries by computing an expected impact score for uncrawled web pages that match needy queries. Query sketches may be created for a subset of the queries by computing an expected impact score for crawled web pages and uncrawled web pages matching the queries. Web pages may then be selected to fetch using a combined query-based estimate and query-independent estimate of the impact of fetching the web pages on search query results.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.