System and method for fast identification of variable roles during initial data exploration
US9239867B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 10, 2014 |
| Grant date | Jan 19, 2016 |
| Priority date | — |
| Expiry date | Nov 10, 2034 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06Q10/0631
- WIPO fieldIT methods for management
- WIPO sectorElectrical engineering
Abstract
Systems and methods are provided for identifying data variable roles during initial data exploration. A variable type, unique data value count values, and an overflow count value are determined for a variable. The unique data value count values include a number of occurrences of each of a plurality of unique data values for the variable in a data set. The overflow count value is a number of occurrences of data values other than the plurality of unique data values for the variable in the data set. When a number of the plurality of unique data values is greater than a value for a high cardinality threshold, the variable is determined to be a high cardinality variable. When the variable is not determined to be the high cardinality variable, a class variable role is assigned to the variable. When the variable is determined to be the high cardinality variable, Whether or not the variable is a numeric variable type is determined based on the determined variable type. When the variable is determined to not be the numeric variable type, the overflow count value is compared to the unique data value count values to determine whether or not rare visible values occurred for the variable. When …
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.