Patent · US Active

Machine learning based end-to-end extraction of tables from electronic documents

US11837005B2 · kind B2 · utility

2Cited by
27References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateFeb 22, 2023
Grant dateDec 5, 2023
Priority date
Expiry dateFeb 22, 2043

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/10
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

In some embodiments, a method includes identifying a set of word bounding boxes in a first electronic document, and identifying locations of horizontal white space between two adjacent rows from a set of rows in a table. The method includes determining, using a Natural Language Processing algorithm, an entity name from a set of entity names for each table cell from a set of table cells in the table. The method includes determining, using a machine learning algorithm a class from a set of classes for each row from the set of rows. The method includes extracting a set of table cell values associated with the set of table cells, and generating a second electronic document including the set of table cell values arranged in the set of rows and the set of columns such that the set of words in the table are computer-readable in the second electronic document.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.