Patent · US Active

Web crawler scheduler that utilizes sitemaps from websites

US9002819B2 · kind B2 · utility

7Cited by
25References
33Claims
0Family size

Assignee

Inventors

Key dates

Filing dateApr 8, 2013
Grant dateApr 7, 2015
Priority date
Expiry dateJun 16, 2033

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/951
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Systems and methods for scheduling documents for crawling are disclosed. In some implementations, a method includes obtaining sitemap information for a plurality of websites; and analyzing the sitemap information to identify a website, in the plurality of websites. The website has sitemap information that is at least potentially out of date. The method also includes updating the sitemap information for the identified website by downloading updated sitemap information for the identified website; and scheduling documents for crawling in accordance with the updated sitemap information for the identified website.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.