Systems and methods for management of data platforms
US11281626B2 · kind B2 · utility
Assignee
Inventor
Key dates
| Filing date | May 17, 2019 |
| Grant date | Mar 22, 2022 |
| Priority date | — |
| Expiry date | Jun 4, 2040 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F40/211
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
In system for analyzing large data sets, document/file format can be discovered by attempting to parse the file using several parsers to generate a schema, assigning a score to each parsing, and selecting a parser based on the assigned scores. Schema element attributes, such as statistical parameters, can be derived and used in identifying schema elements associated with other files. Attributes of identified schema elements can be used to substitute missing data values with values based on such attributes. Data values corresponding schema elements can be selected and highlighted, and schema elements and/or attributes thereof can be highlighted based on selected data values. From a cluster of files, a lineage relationship between file pairs, indicating whether one file is derived from another, can be determined for several files. In reducing/compacting data, utilization of all available reducers can be optimized according to current utilization of one or more reducers.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.