System and method for extracting tabular data from electronic document
US10970535B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 27, 2019 |
| Grant date | Apr 6, 2021 |
| Priority date | — |
| Expiry date | Jul 4, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/10
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Disclosed is system for extracting tabular data from electronic document, system having data processing arrangement comprising: tabular data detection module that is operable to: (i) receive electronic document; (ii) determine location of tabular data within electronic document; and (iii) extract image of tabular data from electronic document; and tabular data extraction module that receives extracted image of tabular data from tabular data detection module, wherein tabular data extraction module is operable to: (i) convert received image of tabular data into greyscale image; (ii) extract grid structure from greyscale image; (iii) remove grid structure from greyscale image; (iv) determine position for placement of horizontal and vertical lines in greyscale image; (v) generate horizontal and vertical lines on greyscale image; (vi) perform optical character recognition of text associated with tabular data from received image; and (vii) extract tabular data by combining information of grid structure with text, to generate tabular data.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.