Patent · US Expired

System and method for focussed web crawling

US6418433B1 · kind B1 · utility

313Cited by

8References

32Claims

0Family size

Assignee

International Business Machines Corporation · US

Inventors

Soumen Chakrabarti · Mumbai, IN
Byron Edward Dom · Los Gatos, US
Martin Henk Van Den Berg · Palo Alto, US

Key dates

Filing date	Jan 28, 1999
Grant date	Jul 9, 2002
Priority date	—
Expiry date	Jan 28, 2019

Classification

Technology area (CPC Y)Emerging Cross-Sectional Technologies
CPC primaryY10S707/99935
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A focussed Web crawler learns to recognize Web pages that are relevant to the interest of one or more users, from a set of examples provided by the users. It then explores the Web starting from the example set, using the statistics collected from the examples and other analysis on the link graph of the growing crawl database, to guide itself towards relevant, valuable resources and away from irrelevant and/or low quality material on the Web. Thereby, the Web crawler builds a comprehensive topic-specific library for the benefit of specific users.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.