Web crawler scheduler that utilizes sitemaps from websites
US9002819B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 8, 2013 |
| Grant date | Apr 7, 2015 |
| Priority date | — |
| Expiry date | Jun 16, 2033 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/951
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Systems and methods for scheduling documents for crawling are disclosed. In some implementations, a method includes obtaining sitemap information for a plurality of websites; and analyzing the sitemap information to identify a website, in the plurality of websites. The website has sitemap information that is at least potentially out of date. The method also includes updating the sitemap information for the identified website by downloading updated sitemap information for the identified website; and scheduling documents for crawling in accordance with the updated sitemap information for the identified website.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.