Data extraction confidence attribute with transformations
US8676731B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 11, 2011 |
| Grant date | Mar 18, 2014 |
| Priority date | — |
| Expiry date | Mar 26, 2032 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/40
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A data extraction system for receiving and scanning documents to generate ordered input for storage in a database employs a non-linear statistical model for a data extraction sequence having a plurality of transformations. Each transformation transitions an extracted data value in various forms from a raw data image to a computed data value. For each transformation, a confidence model learns a confidence component for the particular transformation. The learned confidence components, generated from a control set of documents having known values, are employed in a production mode with actual raw data. The confidence component corresponds to a likelihood of transformation accuracy, and the confidence model aggregates the confidence components to compute a confidence for the extracted data value. A database stores the extracted data value labeled with the computed confidence attribute for subsequent use by an application employing the extracted data.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.