Using a data mining algorithm to generate format rules used to validate data sets
US8166000B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 27, 2007 |
| Grant date | Apr 24, 2012 |
| Priority date | — |
| Expiry date | Jul 17, 2030 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/215
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Provided are a method, system, and article of manufacture for using a data mining algorithm to generate format rules used to validate data sets. A data set has a plurality of columns and records providing data for each of the columns. Selection is received of at least one format column for which format rules are to be generated and selection is received of at least one predictor column. A format mask column is generated for each selected format column. For records in the data set, a value in the at least one format column is converted to a format mask representing a format of the value in the format column and storing the format mask in the format mask column in the record for which the format mask was generated. The at least one predictor column and the at least one format mask column are processed to generate at least one format rule. Each format rule specifies a format mask associated with at least one condition in the at least one predictor column.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.