System for information extraction from form-like documents
US11393233B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 2, 2020 |
| Grant date | Jul 19, 2022 |
| Priority date | — |
| Expiry date | Jan 19, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06T2207/30176
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
The present disclosure is directed to extracting text from form-like documents. In particular, a computing system can obtain an image of a document that contains a plurality of portions of text. The computing system can extract one or more candidate text portions for each field type included in a target schema. The computing system can generate a respective input feature vector for each candidate for the field type. The computing system can generate a respective candidate embedding for the candidate text portion. The computing system can determine a respective score for each candidate text portion for the field type based at least in part on the respective candidate embedding for the candidate text portion. The computing system can assign one or more of the candidate text portions to the field type based on the respective scores.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.