Patent · US Active

Document reuse in a search engine crawler

US8707312B1 · kind B1 · utility

2Cited by
68References
37Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 30, 2004
Grant dateApr 22, 2014
Priority date
Expiry dateAug 11, 2028

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/951
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A search engine crawler includes a scheduler for determining which documents to download from their respective host servers. Some documents, known to be stable based on one or more record from prior crawls, are reused from a document repository. A reuse flag is set in a scheduler record that also contains a document identifier, the reuse flag indicating whether the document should be retrieved from a first database, such as the World Wide Web, or a second database, such as a document repository. A set of such scheduler records are used during a crawl by the search engine crawler to determine which database to use when retrieving the documents identified in the scheduler records.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.