Patent · US Active

Method and system for identifying product-related information on a web page

US7912755B2 · kind B2 · utility

59Cited by
9References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 23, 2005
Grant dateMar 22, 2011
Priority date
Expiry dateOct 11, 2029

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06Q30/02
  • WIPO fieldIT methods for management
  • WIPO sectorElectrical engineering

Abstract

A method and system is provided that in a fully automated manner crawls web sites and identifies specific types of web pages, then extracts targeted data from those web pages. One or more text nodes containing product-related information on a first web page are first identified, and the locations of those text nodes are described using one or more vectors. The vectors are then analyzed to identify one or more patterns and to generate a model from those patterns that discriminates between text nodes that contain product-related information and text nodes that do not contain product-related information on a second web page. The model can then be used to crawl web sites to identify and extract targeted data, or the model can be installed on a user's computer to identify and extract targeted information from web sites as the user is browsing.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.