Patent · US Active

Stratified sampling using adaptive parallel data processing

US9697277B2 · kind B2 · utility

0Cited by
2References
18Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJul 13, 2016
Grant dateJul 4, 2017
Priority date
Expiry dateJul 13, 2036

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/9535
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A computer-implemented method includes partitioning a plurality of records into a plurality of splits. Each split includes at least a portion of the plurality of records. The method further includes providing at least one split of the plurality of splits to a mapper. The mapper scans the input data set, transforms each input record using a map function, and extracts a grouping key in parallel. The method further includes assigning at least a portion the records of the at least one split to a group. Each assignment to the group is based on a strata of the assigned record, and filtering the records of the group. Each filtering is based on a comparison of a weight of a record to a local threshold of the mapper. The method further includes shuffling the group to a reducer and providing a stratified sampling of the plurality of records based on the group.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.