Patent · US Active

Management of intermediate data spills during the shuffle phase of a map-reduce job

US9740706B2 · kind B2 · utility

12Cited by
4References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateJun 21, 2016
Grant dateAug 22, 2017
Priority date
Expiry dateJun 21, 2036

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F16/24578
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A system and a method for spill management during the shuffle phase of a map-reduce job performed in a distributed computer system on distributed files. A spilling protocol is provided for handling the spilling of intermediate data based on at least one popularity attribute of key-value pairs of the input data on which the map-reduce job is performed. The spilling protocol includes an assignment order to storage resources belonging to the computer system based on the at least one popularity attribute. The protocol can be deployed in computer systems with heterogeneous storage resources. Additionally, pointers or tags can be assigned to improve shuffle phase performance. The distributed file systems that are most suitable are ones usable by Hadoop, e.g., Hadoop Distributed File System (HDFS).

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.