Hierarchical conditional random fields for web extraction
US7720830B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 31, 2006 |
| Grant date | May 18, 2010 |
| Priority date | — |
| Expiry date | Dec 26, 2027 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/904
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method and system for labeling object information of an information page is provided. A labeling system identifies an object record of an information page based on the labeling of object elements within an object record and labels object elements based on the identification of an object record that contains the object elements. To identify the records and label the elements, the labeling system generates a hierarchical representation of blocks of an information page. The labeling system identifies records and elements within the records by propagating probability-related information of record labels and element labels through the hierarchy of the blocks. The labeling system generates a feature vector for each block to represent the block and calculates a probability of a label for a block being correct based on a score derived from the feature vectors associated with related blocks. The labeling system searches for the labeling of records and elements that has the highest probability of being correct.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.