Patent · US Active

System and method for identifying document structure and associated metainformation

US7937338B2 · kind B2 · utility

10Cited by
0References
21Claims
0Family size

Assignee

Inventors

Key dates

Filing dateApr 30, 2008
Grant dateMay 3, 2011
Priority date
Expiry dateDec 10, 2028

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06V30/416
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A system and method for processing documents by utilizing the textual content and layout of the documents, including visual indicators, to more efficiently and reliably process the documents across various document types. The system and method identifies visually distinguishable elements within the document, such as section and sub-section boundary indicators, to mark, divide and label the boundaries and content type such that the sections are more clearly identifiable and easily processed. The system and method uses known elements, including section heading types, keywords, section type classifiers, sub-section heading constructs, stop words, and the like to adaptively identify and process a broad range of document types. The system and method continually refines and updates these known elements and allows users to discover and define new elements for further refinement and updating.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.