Machine learning based information extraction
US12333838B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Aug 17, 2022 |
| Grant date | Jun 17, 2025 |
| Priority date | — |
| Expiry date | Jan 16, 2044 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/10
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Computer-readable media, methods, and systems are disclosed for applying machine learning mechanisms to classify and validate documents based on expense rule sets and external data validation services. Document images associated with expenses are received in connection with a reimbursable event. For each received document image data associated with the received document image is transmitted to an optical character recognition image processor that can recognize contents and associated coordinates. OCR data is received and transmitted to a text tokenizer. Tokenized text is received corresponding to expense details, and the tokenized text and coordinates are sent to a text feature generator. Text feature vectors are received and transmitted to a document classifier and a document classification received. Document fields are extracted and based thereon a document is validates and a corresponding reimbursement instruction generated.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.