Patent · US Active

Method and techniques for determining crawling schedule

US8862569B2 · kind B2 · utility

9Cited by
1References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJan 11, 2012
Grant dateOct 14, 2014
Priority date
Expiry dateJan 11, 2032

Classification

  • Technology area (CPC H)Electricity
  • CPC primaryH04L67/02
  • WIPO fieldDigital communication
  • WIPO sectorElectrical engineering

Abstract

Methods, systems and computer-readable storage medium for determining a crawling schedule. In an aspect, a method includes obtaining crawl history data for a Web site having Web pages, determining a status of the Web pages, determining a total quantity of Web pages that have a status of deleted, calculating a probability that another Web page of the Web site will be removed based on the total quantity, and storing data associating the calculated probability with the Web site. The method can further include determining, for a plurality of sets of the previous time periods, a respective crawl penalty as a combination of a penalty for crawling the Web site and a penalty for showing a deleted Web page based on the calculated probability, and determining a re-crawl schedule based on the crawl penalties.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.