Method and techniques for determining crawling schedule
US8862569B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 11, 2012 |
| Grant date | Oct 14, 2014 |
| Priority date | — |
| Expiry date | Jan 11, 2032 |
Classification
- Technology area (CPC H)Electricity
- CPC primaryH04L67/02
- WIPO fieldDigital communication
- WIPO sectorElectrical engineering
Abstract
Methods, systems and computer-readable storage medium for determining a crawling schedule. In an aspect, a method includes obtaining crawl history data for a Web site having Web pages, determining a status of the Web pages, determining a total quantity of Web pages that have a status of deleted, calculating a probability that another Web page of the Web site will be removed based on the total quantity, and storing data associating the calculated probability with the Web site. The method can further include determining, for a plurality of sets of the previous time periods, a respective crawl penalty as a combination of a penalty for crawling the Web site and a penalty for showing a deleted Web page based on the calculated probability, and determining a re-crawl schedule based on the crawl penalties.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.