Patent · US Active

Information extraction using spatial reasoning on the CSS2 visual box model

US8719291B2 · kind B2 · utility

3Cited by
1References
3Claims
0Family size

Assignee

Inventors

Key dates

Filing dateApr 24, 2008
Grant dateMay 6, 2014
Priority date
Expiry dateJul 18, 2029

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/95
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method for extracting tabular information from a web source by determining a plurality of coordinates for a plurality of visualized element nodes on the web source; determining a subset of the plurality of visualized element nodes based on the plurality of coordinates to obtain a candidate web table, wherein each of the subset of the plurality of visualized element nodes constitutes a logical cell of the candidate web table; determining textual content corresponding to the subset of the plurality of visualized element nodes as the textual content would appear after rendering the web source in a browser; and transforming the candidate web table into an explicit representation of relative spatial relation between at least one of the logical cell; and saving the explicit representation in a structured document format.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.