Patent · US Active

System and method for machine-learning based extraction of information from documents

US11886820B2 · kind B2 · utility

1Cited by

3References

20Claims

0Family size

Assignee

Genpact Luxembourg S.à r.l. II · LU

Inventors

Sreekanth Menon · Kanchinakote, IN
Prakash Selvakumar · Kanchinakote, IN
Sudheesh Sudevan · Thalassery, IN

Key dates

Filing date	Oct 6, 2020
Grant date	Jan 30, 2024
Priority date	—
Expiry date	Aug 23, 2042

Classification

Technology area (CPC G)Physics
CPC primaryG06N20/00
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A method and system are provided for training a machine-learning (ML) system/module and to provide an ML model. In one embodiment, a method includes using a labeled entities set to train a machine learning (ML) system, to obtain an ML model, and using the trained ML model to predict labels for entities in an unlabeled entities set, yielding a machine-labeled entities set. One or more individual ML models may be trained and used in this way, where each individual ML model corresponds to a respective document source. The document sources can be identified via classification of a corpus of documents. The prediction of labels provides a respective confidence score for each machine-labeled entity. The method also includes selecting from the machine-labeled entities set, a subset of machine-labeled entities having a respective confidence score at least equal to a threshold confidence score; and updating the labeled entities set by adding thereto the selected subset of machine-labeled entities. The method further includes removing from the machine-labeled entities set the selected subset of machine-labeled entities and deleting labels assigned to the entities in the updated machine-labele…

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.