Patent · US Active

Accelerating data profiling process

US8719271B2 · kind B2 · utility

1Cited by
7References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateOct 5, 2012
Grant dateMay 6, 2014
Priority date
Expiry dateOct 10, 2032

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/221
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A data profile request is handles by utilizing data in a distributed file system. Tabular data is extracted from a data source and stored in a distributed file system. Each table in the tabular data is split by columns, which are each stored in separate files in a set of physical nodes of the distributed file system. In response to a data profiling request, a master node determines, based on the profiling request, which groups of files are needed to be on a same physical node in order to perform the profiling analysis. The master node creates jobs using physical nodes that contain the requisite files needed for each job.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.