Patent · US Active

Apparatus, method, and program that performs syntax parsing on a structured document in the form of electronic data

US8181105B2 · kind B2 · utility

0Cited by
3References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateApr 3, 2008
Grant dateMay 15, 2012
Priority date
Expiry dateFeb 22, 2031

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/143
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Statistical information about instance documents and schema information are used to integrate multiple state transitions that enable sectioning of a structure document, thereby generating an optimum automaton. In integrating state transitions, consecutively matching state transitions are held in the form of an ID list, which is then used to count the number of consecutive state transitions. Furthermore, patterns in the number of occurrences of repetitive elements including nested elements are statistically obtained. Variations of blanks in XML are addressed by using a statistical method. Schema information is used to build an automaton beforehand, thereby initialization overhead of the syntax parsing apparatus is reduced.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.