Patent · US Active

Schema-informed extraction for unstructured data

US11494425B2 · kind B2 · utility

0Cited by
1References
24Claims
0Family size

Assignee

Inventors

Key dates

Filing dateFeb 3, 2020
Grant dateNov 8, 2022
Priority date
Expiry dateJun 20, 2040

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/177
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method of extracting data from documents is provided. The method comprises receiving input of a number of documents and input of a schema of data items available for extraction from the documents. The documents are parsed into a machine-readable representation, and data items in the machine-readable representation are identified according to the schema. Interpretations of data items are propagated within the documents to disambiguate identified data items, and identified data items are matched with other data items in the documents according to the schema. Only identified data items that include a minimal set of interpretations specified by the schema are extracted.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.