Flexible and scalable structured web data extraction
US8856129B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Sep 20, 2011 |
| Grant date | Oct 7, 2014 |
| Priority date | — |
| Expiry date | Oct 2, 2032 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/355
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
This document describes techniques that label text nodes of a seed site for each of a plurality of verticals. Once a seed site is labeled for a given vertical, the techniques extract features from the labeled text nodes of the seed site. The techniques learn vertical knowledge for the seed site based on the human labels and the extracted features, and adapt the learned vertical knowledge to a new web site to automatically and accurately identify attributes and extract attribute values targeted within a given vertical for structured web data extraction.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.