Stratified sampling using adaptive parallel data processing
US9697277B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jul 13, 2016 |
| Grant date | Jul 4, 2017 |
| Priority date | — |
| Expiry date | Jul 13, 2036 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/9535
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A computer-implemented method includes partitioning a plurality of records into a plurality of splits. Each split includes at least a portion of the plurality of records. The method further includes providing at least one split of the plurality of splits to a mapper. The mapper scans the input data set, transforms each input record using a map function, and extracts a grouping key in parallel. The method further includes assigning at least a portion the records of the at least one split to a group. Each assignment to the group is based on a strata of the assigned record, and filtering the records of the group. Each filtering is based on a comparison of a weight of a record to a local threshold of the mapper. The method further includes shuffling the group to a reducer and providing a stratified sampling of the plurality of records based on the group.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.