Management of intermediate data spills during the shuffle phase of a map-reduce job
US9740706B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Jun 21, 2016 |
| Grant date | Aug 22, 2017 |
| Priority date | — |
| Expiry date | Jun 21, 2036 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F16/24578
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A system and a method for spill management during the shuffle phase of a map-reduce job performed in a distributed computer system on distributed files. A spilling protocol is provided for handling the spilling of intermediate data based on at least one popularity attribute of key-value pairs of the input data on which the map-reduce job is performed. The spilling protocol includes an assignment order to storage resources belonging to the computer system based on the at least one popularity attribute. The protocol can be deployed in computer systems with heterogeneous storage resources. Additionally, pointers or tags can be assigned to improve shuffle phase performance. The distributed file systems that are most suitable are ones usable by Hadoop, e.g., Hadoop Distributed File System (HDFS).
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.