Patent · US Active

System and method for unsupervised density based table structure identification

US11347708B2 · kind B2 · utility

1Cited by
13References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 11, 2019
Grant dateMay 31, 2022
Priority date
Expiry dateJun 17, 2040

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/28
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Embodiments described herein provide unsupervised density-based clustering to infer table structure from document. Specifically, a number of words are identified from a block of text in an noneditable document, and the spatial coordinates of each word relative to the rectangular region are identified. Based on the word density of the rectangular region, the words are grouped into clusters using a heuristic radius search method. Words that are grouped into the same cluster are determined to be the element that belong to the same cell. In this way, the cells of the table structure can be identified. Once the cells are identified based on the word density of the block of text, the identified cells can be expanded horizontally or grouped vertically to identify rows or columns of the table structure.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.