Patent · US Active

Systems and methods for management of data platforms

US11281626B2 · kind B2 · utility

1Cited by
19References
14Claims
0Family size

Assignee

Inventor

Key dates

Filing dateMay 17, 2019
Grant dateMar 22, 2022
Priority date
Expiry dateJun 4, 2040

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F40/211
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

In system for analyzing large data sets, document/file format can be discovered by attempting to parse the file using several parsers to generate a schema, assigning a score to each parsing, and selecting a parser based on the assigned scores. Schema element attributes, such as statistical parameters, can be derived and used in identifying schema elements associated with other files. Attributes of identified schema elements can be used to substitute missing data values with values based on such attributes. Data values corresponding schema elements can be selected and highlighted, and schema elements and/or attributes thereof can be highlighted based on selected data values. From a cluster of files, a lineage relationship between file pairs, indicating whether one file is derived from another, can be determined for several files. In reducing/compacting data, utilization of all available reducers can be optimized according to current utilization of one or more reducers.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.