Extracting information from unstructured documents using natural language processing and conversion of unstructured documents into structured documents
US11423042B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Feb 7, 2020 |
| Grant date | Aug 23, 2022 |
| Priority date | — |
| Expiry date | Jun 25, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/345
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Aspects of the present disclosure describe techniques for generating a machine learning model for extracting information from textual content. The method generally includes receiving a training data set including a plurality of documents having related textual strings. A relevancy model is generated from the training data set. The relevancy model is generally configured to generate relevance scores for a plurality of words extracted from the plurality of documents. A knowledge graph model illustrating relationships between the plurality of words extracted from the plurality of documents is generated from the training data set. The relevancy model and the knowledge graph model are aggregated into a complimentary model including a plurality of nodes from the knowledge graph model and weights associated with edges between connected nodes, wherein the weights comprise relevance scores generated from the relevancy model, and the complimentary model is deployed for use in analyzing documents.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.