Patent · US Expired

Distributing data items to corresponding buckets for use in parallel operations

US6978458B1 · kind B1 · utility

19Cited by
7References
26Claims
0Family size

Assignee

Inventors

Key dates

Filing dateNov 17, 2000
Grant dateDec 20, 2005
Priority date
Expiry dateAug 19, 2022

Classification

  • Technology area (CPC Y)Emerging Cross-Sectional Technologies
  • CPC primaryY10S707/99937
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Techniques are provided for evenly distributing data items of a particular set of data to a plurality of buckets. The buckets of data items may then be assigned to processes to perform operations on the data items in parallel with the other processes. In one embodiment, the set of data (which may come from tables or be the result set of a previous operation) is divided into a plurality of subsets. For each subset of the plurality of subsets, a sample of data items is randomly selected. The sampling itself may be performed in parallel, with each sampling process using a different seed to randomize its selection of samples. The sampled data items are sorted and ranges are determined based on distribution keys of the sampled data items. The ranges are assigned to buckets, and the data items are then distributed to the buckets assigned to the range into which their distribution key falls.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.