Patent · US Active

Computerized recognition and extraction of tables in digitized documents

US11182604B1 · kind B1 · utility

16Cited by
8References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 26, 2019
Grant dateNov 23, 2021
Priority date
Expiry dateJun 10, 2040

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/416
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Information contained in tables in a digitized document is extracted by retrieving table layout data regarding bounding boxes, each being auto-generated by the system and/or (re)generated by a user to the digitized image of a sample document. A row template is used to identify a first table, by automatically scanning within the document. Upon detecting a possible row in the input image, a Row Possibility Confidence Value (RPCV) is generated that indicates a likelihood that the possible row corresponds to an actual row in the first table. The possible row is regarded as an actual row if the RPCV exceeds a predetermined threshold value. For repeated tables in a document only the first table needs to be identified via bounding boxes. Also, related tables can be linked to permit linked data to be extracted to a structured file. Also, only the primary column in a readable and existent table header is required to extract table values across columns.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.