Information extraction using spatial reasoning on the CSS2 visual box model
US8719291B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 24, 2008 |
| Grant date | May 6, 2014 |
| Priority date | — |
| Expiry date | Jul 18, 2029 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/95
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for extracting tabular information from a web source by determining a plurality of coordinates for a plurality of visualized element nodes on the web source; determining a subset of the plurality of visualized element nodes based on the plurality of coordinates to obtain a candidate web table, wherein each of the subset of the plurality of visualized element nodes constitutes a logical cell of the candidate web table; determining textual content corresponding to the subset of the plurality of visualized element nodes as the textual content would appear after rendering the web source in a browser; and transforming the candidate web table into an explicit representation of relative spatial relation between at least one of the logical cell; and saving the explicit representation in a structured document format.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.