Patent · US Active

System and method for fast identification of variable roles during initial data exploration

US9239867B2 · kind B2 · utility

0Cited by
0References
33Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 10, 2014
Grant dateJan 19, 2016
Priority date
Expiry dateNov 10, 2034

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06Q10/0631
  • WIPO fieldIT methods for management
  • WIPO sectorElectrical engineering

Abstract

Systems and methods are provided for identifying data variable roles during initial data exploration. A variable type, unique data value count values, and an overflow count value are determined for a variable. The unique data value count values include a number of occurrences of each of a plurality of unique data values for the variable in a data set. The overflow count value is a number of occurrences of data values other than the plurality of unique data values for the variable in the data set. When a number of the plurality of unique data values is greater than a value for a high cardinality threshold, the variable is determined to be a high cardinality variable. When the variable is not determined to be the high cardinality variable, a class variable role is assigned to the variable. When the variable is determined to be the high cardinality variable, Whether or not the variable is a numeric variable type is determined based on the determined variable type. When the variable is determined to not be the numeric variable type, the overflow count value is compared to the unique data value count values to determine whether or not rare visible values occurred for the variable. When …

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.