Patent · US Active

Method, distributed system and computer program for failure recovery

US8880931B2 · kind B2 · utility

7Cited by
3References
15Claims
0Family size

Assignee

Inventor

Key dates

Filing dateDec 24, 2010
Grant dateNov 4, 2014
Priority date
Expiry dateJul 3, 2031

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F11/2097
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A distributed system includes: nodes each having a memory, running distributed processes, and checkpointing to create checkpoint data for each process; a selection unit selecting spare nodes for future failure recovery for each process; an allocation unit allocating and transmitting the checkpoint data to the spare nodes to make the spare nodes store the checkpoint data before failure; and a recovery unit selecting one checkpoint data for recovery, activates the selected checkpoint data to run a process on the spare node, or partitions the existing stored checkpoint data, when any checkpoint data is not suitable for recovery, the partitions of the checkpoint data as a whole being integrated into a complete checkpoint data; and transmitting the partitions from the spare nodes to a new node, and reorganizing the partitions into complete data to be activated to run a process on the new node.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.