Patent · US Active

Flexible and scalable structured web data extraction

US8856129B2 · kind B2 · utility

8Cited by
5References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 20, 2011
Grant dateOct 7, 2014
Priority date
Expiry dateOct 2, 2032

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/355
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

This document describes techniques that label text nodes of a seed site for each of a plurality of verticals. Once a seed site is labeled for a given vertical, the techniques extract features from the labeled text nodes of the seed site. The techniques learn vertical knowledge for the seed site based on the human labels and the extracted features, and adapt the learned vertical knowledge to a new web site to automatically and accurately identify attributes and extract attribute values targeted within a given vertical for structured web data extraction.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.