Patent · US Expired

Unsupervised, automated web host dynamicity detection, dead link detection and prerequisite page discovery for search indexed web pages

US7610267B2 · kind B2 · utility

7Cited by
6References
26Claims
0Family size

Assignee

Inventors

Key dates

Filing dateAug 13, 2005
Grant dateOct 27, 2009
Priority date
Expiry dateMar 17, 2026

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99933
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Automated crawling of page links associated with a site domain that was previously crawled involves computing the dynamicity of a site based on totals of continuous dead links, live links and/or prerequisite pages encountered while crawling page links corresponding to the site. The degree to which links are crawled is optimized based on the dynamicity of the site. Some pages require that another particular page (i.e., a prerequisite page) is retrieved from the host prior to retrieving a given page, e.g., so that the prerequisite page can set a cookie. Prerequisite pages are determined based on stored information about pages that were retrieved, during a previous crawl, prior to retrieving a page. Prerequisite pages are identified to a search system so that when a user clicks on the URL for the page, the request is redirected to the prerequisite page to set the cookie appropriately.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.