Method and system for identifying product-related information on a web page
US7912755B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 23, 2005 |
| Grant date | Mar 22, 2011 |
| Priority date | — |
| Expiry date | Oct 11, 2029 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06Q30/02
- WIPO fieldIT methods for management
- WIPO sectorElectrical engineering
Abstract
A method and system is provided that in a fully automated manner crawls web sites and identifies specific types of web pages, then extracts targeted data from those web pages. One or more text nodes containing product-related information on a first web page are first identified, and the locations of those text nodes are described using one or more vectors. The vectors are then analyzed to identify one or more patterns and to generate a model from those patterns that discriminates between text nodes that contain product-related information and text nodes that do not contain product-related information on a second web page. The model can then be used to crawl web sites to identify and extract targeted data, or the model can be installed on a user's computer to identify and extract targeted information from web sites as the user is browsing.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.