Patent · US Expired

Adaptive web crawling using a statistical model

US7328401B2 · kind B2 · utility

77Cited by

18References

14Claims

0Family size

Assignee

Microsoft Corporation · US

Inventors

Kenji C. Obata · Arao, JP
Dmitriy Meyerzon · Bellevue, US

Key dates

Filing date	Dec 22, 2004
Grant date	Feb 5, 2008
Priority date	—
Expiry date	Mar 25, 2026

Classification

Technology area (CPC Y)Emerging Cross-Sectional Technologies
CPC primaryY10S707/99933
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A computer based system and method of retrieving information pertaining to documents on a computer network is disclosed. The method includes selecting a set of documents to be accessed during a Web crawl by utilizing a statistical model to determine which previously retrieved documents are most likely to have changed since last accessed. The statistical model is continuously improving its accuracy by training internal probability distributions to reflect the actual experience with change rate patterns of the documents accessed. The decision made whether to access the document is based on the probability of change compared against a desired synchronization level, random selections, maximum limits on the amount of time since the document was last accessed, and other criterion. Once the decision to access is made, the document is checked for changes and this information is used to train the statistical model.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.