Distributing data items to corresponding buckets for use in parallel operations
US6978458B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Nov 17, 2000 |
| Grant date | Dec 20, 2005 |
| Priority date | — |
| Expiry date | Aug 19, 2022 |
Classification
- Technology area (CPC Y)Emerging Cross-Sectional Technologies
- CPC primaryY10S707/99937
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
Techniques are provided for evenly distributing data items of a particular set of data to a plurality of buckets. The buckets of data items may then be assigned to processes to perform operations on the data items in parallel with the other processes. In one embodiment, the set of data (which may come from tables or be the result set of a previous operation) is divided into a plurality of subsets. For each subset of the plurality of subsets, a sample of data items is randomly selected. The sampling itself may be performed in parallel, with each sampling process using a different seed to randomize its selection of samples. The sampled data items are sorted and ranges are determined based on distribution keys of the sampled data items. The ranges are assigned to buckets, and the data items are then distributed to the buckets assigned to the range into which their distribution key falls.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.