Patent · US Expired

Collaborative team crawling:Large scale information gathering over the internet

US6182085A · kind A · utility

105Cited by
4References
43Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMay 28, 1998
Grant dateJan 30, 2001
Priority date
Expiry dateMay 28, 2018

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99948
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A distributed collection of web-crawlers to gather information over a large portion of the cyberspace. These crawlers share the overall crawling through a cyberspace partition scheme. They also collaborate with each other through load balancing to maximally utilize the computing resources of each of the crawlers. The invention takes advantage of the hierarchical nature of the cyberspace namespace and uses the syntactic components of the URL structure as the main vehicle for dividing and assigning crawling workload to individual crawler. The partition scheme is completely distributed in which each crawler makes the partitioning decision based on its own crawling status and a globally replicated partition tree data structure.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.