Sampling for preprocessing big data based on features of transformation results
US10459942B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Apr 29, 2016 |
| Grant date | Oct 29, 2019 |
| Priority date | — |
| Expiry date | Apr 30, 2037 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/26
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system determines samples of datasets that are typically processed by big data analysis systems. The samples are for use for development and testing of transformations for preprocessing the datasets in preparation for analysis by big data systems. The system receives one or more transform operations input datasets for the transform operations. The system determines samples associated with the transform operations. According to a sampling strategy, the system determines samples that return at least a threshold number of records in the result set obtained by applying a transformation. According to another sampling strategy, the system receives criteria describing the result of the transform operations and determines sample sets that generate result sets satisfying the criteria as a result of applying the transform operations.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.