Patent · US Active

System for information extraction from form-like documents

US11393233B2 · kind B2 · utility

4Cited by
0References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 2, 2020
Grant dateJul 19, 2022
Priority date
Expiry dateJan 19, 2041

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06T2207/30176
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

The present disclosure is directed to extracting text from form-like documents. In particular, a computing system can obtain an image of a document that contains a plurality of portions of text. The computing system can extract one or more candidate text portions for each field type included in a target schema. The computing system can generate a respective input feature vector for each candidate for the field type. The computing system can generate a respective candidate embedding for the candidate text portion. The computing system can determine a respective score for each candidate text portion for the field type based at least in part on the respective candidate embedding for the candidate text portion. The computing system can assign one or more of the candidate text portions to the field type based on the respective scores.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.