Patent · US Expired

System and method of automatic wrapper grammar generation

US6792576B1 · kind B1 · utility

21Cited by
13References
18Claims
0Family size

Assignee

Inventor

Key dates

Filing dateJul 26, 1999
Grant dateSep 14, 2004
Priority date
Expiry dateJul 26, 2019

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/258
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method for generating a wrapper grammar for a file having a structure of a particular format includes providing at least one sample file of the particular format, where the particular format comprises a plurality of string tokens. Each sample file includes a plurality of tokens (data strings) which may be actual data from the document, an HTML tag or some other grammatical separator. The sample file of the particular format is then processed by annotating attributable tokens with a user-defined attribute, such as Author, Title, etc. from a set of attributes to form an annotated sample set. The annotated sample set is then evaluated to determine if wrapper grammar generation is possible, and if it is possible, a wrapper grammar for the files having a structure of the particular format is generated. Preferably, the annotated sample set is evaluated by determining if all attributes in the annotated sample set are distinguishable from one another.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.