Patent · US Active

Extracting information from unstructured documents using natural language processing and conversion of unstructured documents into structured documents

US11423042B2 · kind B2 · utility

3Cited by

7References

20Claims

0Family size

Assignee

International Business Machines Corporation · US

Inventors

Jothilakshmi Sirangimoorthy · Canton, US
Ritwik Ray · Apex, US
Hui Wang · Ann Arbor, US
Jonathan Chapin RAND · Ann Arbor, US
Scott R. Carrier · Apex, US

Key dates

Filing date	Feb 7, 2020
Grant date	Aug 23, 2022
Priority date	—
Expiry date	Jun 25, 2040

Classification

Technology area (CPC G)Physics
CPC primaryG06F16/345
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

Aspects of the present disclosure describe techniques for generating a machine learning model for extracting information from textual content. The method generally includes receiving a training data set including a plurality of documents having related textual strings. A relevancy model is generated from the training data set. The relevancy model is generally configured to generate relevance scores for a plurality of words extracted from the plurality of documents. A knowledge graph model illustrating relationships between the plurality of words extracted from the plurality of documents is generated from the training data set. The relevancy model and the knowledge graph model are aggregated into a complimentary model including a plurality of nodes from the knowledge graph model and weights associated with edges between connected nodes, wherein the weights comprise relevance scores generated from the relevancy model, and the complimentary model is deployed for use in analyzing documents.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.