Patent · US Expired

Method for content mining of semi-structured documents

US6912555B2 · kind B2 · utility

31Cited by
13References
21Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJan 18, 2002
Grant dateJun 28, 2005
Priority date
Expiry dateJan 17, 2024

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/30
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Embodiments of the present invention are directed to a method for content mining of semi-structured documents. In one embodiment, a semi-structured document is first converted from a document-type specific format such as HTML or PDF, to a document-type independent format such as XML. The document formatting, which contains basic level information about the document's structure, is then analyzed by a series of modules to develop a higher level understanding of the document's structure. These modules append information to the document describing the features which collectively comprise the higher level document structure. The appended information facilitates finding specified information within the document when content mining is performed.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.