System and method for determining numerical representations for categorical data fields
US7272590B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Mar 6, 2003 |
| Grant date | Sep 18, 2007 |
| Priority date | — |
| Expiry date | Oct 3, 2025 |
Classification
- Technology area (CPC Y)Emerging Cross-Sectional Technologies
- CPC primaryY10S707/99942
- WIPO fieldBasic communication processes
- WIPO sectorElectrical engineering
Abstract
A system and method determine numerical representations for categorical data fields by taking advantage of the redundancy of the data records to allow automatic discovery of an order of the categories. A categorical data field is recoded by creating separate tables for each numerical data field occurring in the data records. The separate tables are sorted according to the numerical values of the respective data fields. The recoding of the categories is performed based on the average sort order of occurrences of the category in a specific sorted table. The standard deviation of the numerical codes provided by the categories is calculated for each of the separate recoding tables. The recoding table with the maximum standard deviation is selected as the recoding table to perform the recoding of the categories contained in the respective categorical data field of the data records. A plausibility check is performed for the selected recoding table by excluding the numerical data field that has formed the basis for the sorting of the respective table and recreating the recoding table from the data records. The resulting recoding table and the original recoding table are compared. Resultin…
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.