Patent · US Active

Automated load-balancing of partitions in arbitrarily imbalanced distributed mapreduce computations

US9613127B1 · kind B1 · utility

40Cited by

6References

19Claims

0Family size

Assignee

Quantcast Corporation · US

Inventors

Silvius V. Rus · Orinda, US
Wei Jiang · Sunnyvale, US

Key dates

Filing date	Jun 30, 2014
Grant date	Apr 4, 2017
Priority date	—
Expiry date	Apr 3, 2035

Classification

Technology area (CPC G)Physics
CPC primaryG06F16/278
WIPO fieldComputer technology
WIPO sectorElectrical engineering

Abstract

A distributed computing system executes a MapReduce job on streamed data that includes an arbitrary amount of imbalance with respect to the frequency distribution of the data keys in the dataset. A map task module maps the dataset to a coarse partitioning, and generates a list of the top K keys with the highest frequency among the dataset. A sort task module employs a plurality of sorters to read the coarse partitioning and sort the data into buckets by data key. The values for the top K most frequent keys are separated into single-key buckets. The other less frequently occurring keys are assigned to buckets that each have multiple keys assigned to it. Then, more than one worker is assigned to each single-key bucket. The output of the multiple workers assigned to each respective single-key bucket is stitched together.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.