Patent · US Active

Method and system for tabular information extraction

US11663842B2 · kind B2 · utility

1Cited by
0References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 5, 2020
Grant dateMay 30, 2023
Priority date
Expiry dateFeb 5, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/412
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method and a system for extracting information from a table in a document is provided. The method includes: receiving a document that includes information that is arranged in a table; determining three sets of coordinates that respectively relate to lines, words, and characters included in the document; extracting a list of lines based on the first set of coordinates; reconstructing the rows of the table based on list of lines and the second set of coordinates; reconstructing the columns of the table based on the reconstructed rows and the third set of coordinates; and outputting a reconstruction of the table. The three sets of coordinates are expressible in an hOCR format that is based on an open standard for representation of scanned information that is obtainable by using an optical character recognition (OCR) technique.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.