Schema-informed extraction for unstructured data
US11494425B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Feb 3, 2020 |
| Grant date | Nov 8, 2022 |
| Priority date | — |
| Expiry date | Jun 20, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/177
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method of extracting data from documents is provided. The method comprises receiving input of a number of documents and input of a schema of data items available for extraction from the documents. The documents are parsed into a machine-readable representation, and data items in the machine-readable representation are identified according to the schema. Interpretations of data items are propagated within the documents to disambiguate identified data items, and identified data items are matched with other data items in the documents according to the schema. Only identified data items that include a minimal set of interpretations specified by the schema are extracted.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.