System and method of automatic wrapper grammar generation
US6792576B1 · kind B1 · utility
Assignee
Inventor
Key dates
| Filing date | Jul 26, 1999 |
| Grant date | Sep 14, 2004 |
| Priority date | — |
| Expiry date | Jul 26, 2019 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/258
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method for generating a wrapper grammar for a file having a structure of a particular format includes providing at least one sample file of the particular format, where the particular format comprises a plurality of string tokens. Each sample file includes a plurality of tokens (data strings) which may be actual data from the document, an HTML tag or some other grammatical separator. The sample file of the particular format is then processed by annotating attributable tokens with a user-defined attribute, such as Author, Title, etc. from a set of attributes to form an annotated sample set. The annotated sample set is then evaluated to determine if wrapper grammar generation is possible, and if it is possible, a wrapper grammar for the files having a structure of the particular format is generated. Preferably, the annotated sample set is evaluated by determining if all attributes in the annotated sample set are distinguishable from one another.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.