Patent · US Expired

Learning automatic data extraction system

US6662190B2 · kind B2 · utility

7Cited by
14References
9Claims
0Family size

Assignee

Inventors

Key dates

Filing dateMar 20, 2001
Grant dateDec 9, 2003
Priority date
Expiry dateFeb 8, 2022

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99943
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

An improvement to an automatic data extractor has the capability of discovering new values that are not recognized by the vocabulary of the automatic data extractor and adding them to the record being formed and to the vocabulary, thus accumulating new vocabulary through use. The extractor gleans new values by deducing them from the structure of the text data and learns them by adding them to its vocabulary. The data extractor determines the structure of the data in much the same way as prior art data extractors but then a discovery process is used to identify a series of field lists using preferably at least one field parser and a field grader. The results of the grader are returned to an attribute mapper that identifies the position in the field list for each of the attributes. The content of each field, if not already added to the record and associated with the correct attribute using the recognizer, can now be associated by its position in the field list with an attribute and written to the record as the value for that attribute. Furthermore, a learner assigns that field to the vocabulary list if not already present in the vocabulary.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.