Patent · US Active

Using a data mining algorithm to generate format rules used to validate data sets

US8166000B2 · kind B2 · utility

32Cited by
24References
30Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 27, 2007
Grant dateApr 24, 2012
Priority date
Expiry dateJul 17, 2030

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/215
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Provided are a method, system, and article of manufacture for using a data mining algorithm to generate format rules used to validate data sets. A data set has a plurality of columns and records providing data for each of the columns. Selection is received of at least one format column for which format rules are to be generated and selection is received of at least one predictor column. A format mask column is generated for each selected format column. For records in the data set, a value in the at least one format column is converted to a format mask representing a format of the value in the format column and storing the format mask in the format mask column in the record for which the format mask was generated. The at least one predictor column and the at least one format mask column are processed to generate at least one format rule. Each format rule specifies a format mask associated with at least one condition in the at least one predictor column.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.