System and method for unsupervised density based table structure identification
US11347708B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 11, 2019 |
| Grant date | May 31, 2022 |
| Priority date | — |
| Expiry date | Jun 17, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/28
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Embodiments described herein provide unsupervised density-based clustering to infer table structure from document. Specifically, a number of words are identified from a block of text in an noneditable document, and the spatial coordinates of each word relative to the rectangular region are identified. Based on the word density of the rectangular region, the words are grouped into clusters using a heuristic radius search method. Words that are grouped into the same cluster are determined to be the element that belong to the same cell. In this way, the cells of the table structure can be identified. Once the cells are identified based on the word density of the block of text, the identified cells can be expanded horizontally or grouped vertically to identify rows or columns of the table structure.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.