System and method for machine-learning based extraction of information from documents
US11886820B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Oct 6, 2020 |
| Grant date | Jan 30, 2024 |
| Priority date | — |
| Expiry date | Aug 23, 2042 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06N20/00
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method and system are provided for training a machine-learning (ML) system/module and to provide an ML model. In one embodiment, a method includes using a labeled entities set to train a machine learning (ML) system, to obtain an ML model, and using the trained ML model to predict labels for entities in an unlabeled entities set, yielding a machine-labeled entities set. One or more individual ML models may be trained and used in this way, where each individual ML model corresponds to a respective document source. The document sources can be identified via classification of a corpus of documents. The prediction of labels provides a respective confidence score for each machine-labeled entity. The method also includes selecting from the machine-labeled entities set, a subset of machine-labeled entities having a respective confidence score at least equal to a threshold confidence score; and updating the labeled entities set by adding thereto the selected subset of machine-labeled entities. The method further includes removing from the machine-labeled entities set the selected subset of machine-labeled entities and deleting labels assigned to the entities in the updated machine-labele…
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.