Patent · US Active

Management of intermediate data spills during the shuffle phase of a map-reduce job

US9424274B2 · kind B2 · utility

4Cited by
4References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 3, 2013
Grant dateAug 23, 2016
Priority date
Expiry dateDec 30, 2033

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/24578
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A system and a method for spill management during the shuffle phase of a map-reduce job performed in a distributed computer system on distributed files. A spilling protocol is provided for handling the spilling of intermediate data based on at least one popularity attribute of key-value pairs of the input data on which the map-reduce job is performed. The spilling protocol includes an assignment order to storage resources belonging to the computer system based on the at least one popularity attribute. The protocol can be deployed in computer systems with heterogeneous storage resources. Additionally, pointers or tags can be assigned to improve shuffle phase performance. The distributed file systems that are most suitable are ones usable by Hadoop, e.g., Hadoop Distributed File System (HDFS).

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.