Patent · US Expired

Method and system for trawling the World-wide Web to identify implicitly-defined communities of web pages

US6886129B1 · kind B1 · utility

44Cited by
5References
30Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 24, 1999
Grant dateApr 26, 2005
Priority date
Expiry dateNov 24, 2019

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99933
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method and system for identifying groups of pages of common interest from a collection of hyper-linked pages are disclosed. A plurality of community cores are identified from the collection where each core includes first and second sets of pages, and each page in the first set points to every page in the second set. Each identified core is expanded into a full community which is a subset of the pages regarding a particular topic. The identification community cores is based on the analysis of the Web graph in which the communities correspond to instances of Web subgraphs. Extraneous pages are then pruned to improve the quality of the resulting communities.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.