Method and system for generating parsed document from digital document
US11200412B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 27, 2017 |
| Grant date | Dec 14, 2021 |
| Priority date | — |
| Expiry date | Aug 27, 2038 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/10
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method and system for generating a parsed document from a digital document. The method includes segmenting the digital document into at least one section; classifying the at least one section of the digital document into at least one of a class: text class, table class, figure class, noise class; identifying a reading order of the digital document; and processing each of the at least one section of the digital document. Furthermore, processing each of the at least one section of the digital document comprises extracting content from each of the at least one section based on the class; and structuring the extracted content based on the reading order to generate the parsed document.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.