Patent · US Expired

System and method for storing connectivity information in a web database

US7028039B2 · kind B2 · utility

14Cited by
1References
17Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJan 18, 2001
Grant dateApr 11, 2006
Priority date
Expiry dateAug 27, 2023

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99943
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A web crawler system includes a central processing unit for performing computations in accordance with stored procedures and a network interface for accessing remotely located computers via a network. A web crawler module downloads pages from remotely located servers via the network interface. A first link processing module obtains page link information from the downloaded page; the page link information includes for each downloaded page a row of page identifiers of other pages. A second link processing module encodes the rows of page identifies in a space efficient manner. It arranges the rows of page identifiers in a particular order. For each respective row it identifies a prior row, if any, that best matches the respective row in accordance with predefined row match criteria, determines a set of deletes representing page identifiers in the identified prior row not in the respective row, and determines a set of adds representing page identifiers in the respective row not in the identifier prior row. The second link processing module delta encodes the set of deletes and delta encodes the set of adds for each respective row, and then Huffman codes the delta encoded set of deletes …

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.