Patent · US Active

System and a method for focused re-crawling of Web sites

US7379932B2 · kind B2 · utility

9Cited by
5References
9Claims
0Family size

Assignee

Inventors

Key dates

Filing dateDec 21, 2005
Grant dateMay 27, 2008
Priority date
Expiry dateSep 25, 2026

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99935
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method (100) of crawling the Web (620) is disclosed. The method (100) crawls (120) Web pages on the Web starting from a given (110) set of seed Universal Resource Locators (URLs). Crawled Web pages are partitioned (140) into sets of relevant and irrelevant pages. A set of exclusion and/or inclusion patterns are discovered (150) from the sets of relevant and irrelevant pages, and subsequent crawling of the Web is restricted through the set of exclusion and/or inclusion patterns.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.