Patent · US Active

Scheduler for search engine crawler

US8707313B1 · kind B1 · utility

1Cited by

64References

15Claims

0Family size

Assignee

Google LLC · US

Inventors

Huican Zhu · San Jose, US
Maximilian Ibel · Sursee, CH
Anurag Acharya · Campbell, US
Howard Gobioff · San Francisco, US

Key dates

Filing date	Feb 18, 2011
Grant date	Apr 22, 2014
Priority date	—
Expiry date	Sep 25, 2031

Classification

Technology area (CPC G)Physics
CPC primaryG06F16/951
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A search engine crawler includes a distributed set of schedulers that are associated with one or more segments of document identifiers (e.g., URLs) corresponding to documents on a network (e.g., WWW). Each scheduler handles the scheduling of document identifiers (for crawling) for a subset of the known document identifiers. Using a starting set of document identifiers, such as the document identifiers crawled (or scheduled for crawling) during the most recent completed crawl, the scheduler removes from the starting set those document identifiers that have been unreachable in each of the last X crawls. Other filtering mechanisms may also be used to filter out some of the document identifiers in the starting set. The resulting list of document identifiers is written to a scheduled output file for use in a next crawl cycle.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.