Patent · US Active

Heuristic domain targeted table detection and extraction technique

US10706228B2 · kind B2 · utility

15Cited by
4References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateDec 1, 2017
Grant dateJul 7, 2020
Priority date
Expiry dateAug 4, 2038

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/10
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method, system, and apparatus are provided for processing tables embedded within documents wherein a first table header is detected by using semantic groupings of table header terms to identify a minimum number of table header terms in a scanned line of an text document; a potential data zone is extracted by applying white space correlation analysis to a portion of the text document that is adjacent to the first table header; one or more data zone columns from the potential data zone are grouped and aligned with a corresponding header column in the first table header to form a candidate table; data cleansing is performed on the candidate table; and then one or more columns of the candidate table are evaluated using natural language processing to apply a specified table analysis.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.