Document matching and data extraction
US11860950B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 30, 2021 |
| Grant date | Jan 2, 2024 |
| Priority date | — |
| Expiry date | Dec 25, 2041 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N3/045
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
The system is configured to create a generalized document automation framework that captures relevant data from documents based upon replicating historical human actions associated with a document. The system may use machine vision and natural language processing to match a new document to a document that was already human extracted in an existing corpus. This is accomplished by comparing both visual elements and textual elements. This match can be verified by statistical approaches by comparing the match metrics across multiple documents. After the match has been found and verified, the system then uses the historical extractions from the historical document and maps the extractions to similar regions in the new document based upon again both visual and text commonalities between documents. Data is then extracted from these regions of interest in the new document, sanity checked for data integrity against historical data, and then passed downstream for processing.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.