Patent · US Expired

Methods and systems for reconstructing the state of a computation

US5712971A · kind A · utility

179Cited by
8References
12Claims
0Family size

Assignee

Inventors

Key dates

Filing dateDec 11, 1995
Grant dateJan 27, 1998
Priority date
Expiry dateDec 11, 2015

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F11/1474
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

Methods and systems for running and checkpointing parallel and distributed applications which does not require modification to the programs used in the system nor changes to the underlying operating system. One embodiment of the invention includes the following general steps: (1) starting an application on a parallel processing system; (2) controlling processes for the application, including recording of commands and responses; (3) controlling a commit protocol; (4) detecting failures of the application; (5) continuing execution of the application from the most recently committed transaction after "replaying" the recorded commands and responses. A second embodiment comprises the following general steps: (1) starting an application on a parallel processing system; (2) controlling processes for the application, including recurrent recording of the memory image of a driver program that controls the application; (3) controlling a commit protocol; (4) detecting failures of the application; (5) continuing execution of the application from the most recently committed transaction after "restoring" the recorded memory image of the driver program.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.