System and method for identifying document structure and associated metainformation
US7937338B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 30, 2008 |
| Grant date | May 3, 2011 |
| Priority date | — |
| Expiry date | Dec 10, 2028 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06V30/416
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system and method for processing documents by utilizing the textual content and layout of the documents, including visual indicators, to more efficiently and reliably process the documents across various document types. The system and method identifies visually distinguishable elements within the document, such as section and sub-section boundary indicators, to mark, divide and label the boundaries and content type such that the sections are more clearly identifiable and easily processed. The system and method uses known elements, including section heading types, keywords, section type classifiers, sub-section heading constructs, stop words, and the like to adaptively identify and process a broad range of document types. The system and method continually refines and updates these known elements and allows users to discover and define new elements for further refinement and updating.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.