Apparatus, method, and program that performs syntax parsing on a structured document in the form of electronic data
US8181105B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 3, 2008 |
| Grant date | May 15, 2012 |
| Priority date | — |
| Expiry date | Feb 22, 2031 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/143
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Statistical information about instance documents and schema information are used to integrate multiple state transitions that enable sectioning of a structure document, thereby generating an optimum automaton. In integrating state transitions, consecutively matching state transitions are held in the form of an ID list, which is then used to count the number of consecutive state transitions. Furthermore, patterns in the number of occurrences of repetitive elements including nested elements are statistically obtained. Variations of blanks in XML are addressed by using a statistical method. Schema information is used to build an automaton beforehand, thereby initialization overhead of the syntax parsing apparatus is reduced.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.