Electronic document content extraction and document type determination
US10909309B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jan 12, 2018 |
| Grant date | Feb 2, 2021 |
| Priority date | — |
| Expiry date | Nov 14, 2038 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/418
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system and method includes receiving content of an electronic document having a document type, the content divided into components each having a unique identifier and selecting an extraction schema based on the document type, the extraction schema having a plurality of data categories. For each of the components, the extraction schema is applied to identify content of the component that corresponds to individual ones of the data categories and saving, with the processor, in an electronic data storage, in a record associated with the component, category metadata indicative of content of the component corresponding to the data categories. In response to obtaining the category metadata for each of the components, applying the extraction schema to the content metadata of each of the components and to the electronic document as a whole to determine document metadata. A user interface displays the document metadata on the user interface.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.