Patent · US Active

Learning characteristics for extraction of information from web pages

US9443250B1 · kind B1 · utility

0Cited by
2References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateApr 9, 2013
Grant dateSep 13, 2016
Priority date
Expiry dateJul 16, 2035

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06Q30/0641
  • WIPO fieldIT methods for management
  • WIPO sectorElectrical engineering

Abstract

A learning module of an information retrieval system is configured to automatically learn distinctive characteristics used by different web sites when presenting data variables of interest. The learned information can then be used to identify data variables of interest on arbitrary web pages of the web sites. In one embodiment, the learning process is guided by feeds provided by the web sites that list values for data variables of interest, and by web pages also provided by the web sites. The values of the feeds enable the learning module to identify candidate portions of the web pages that may represent a data variable of interest. Weights are computed for different values of various properties of the candidate portions, aggregated over all the analyzed pages, and used to identify one of the candidate portions as being the best candidates.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.