Table recognition in portable document format documents
US11200413B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 31, 2018 |
| Grant date | Dec 14, 2021 |
| Priority date | — |
| Expiry date | Feb 20, 2039 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/416
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Methods, systems, and computer program products for table recognition in PDF documents are provided herein. A computer-implemented method includes discretizing one or more contiguous areas of a PDF document; identifying one or more white-space separator lines within the one or more discretized contiguous areas of the PDF document; detecting one or more candidate table regions within the one or more discretized contiguous areas of the PDF document by clustering the one or more white-space separator lines into one or more grids; and outputting at least one of the candidate table regions as a finalized table in accordance with scores assigned to each of the one or more candidate table regions based on (i) border information and (ii) cell structure information.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.