Patent · US Expired

System of performing checkpoint/restart of a parallel program

US6401216B1 · kind B1 · utility

62Cited by
45References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateOct 29, 1998
Grant dateJun 4, 2002
Priority date
Expiry dateOct 29, 2018

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F11/1458
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A checkpoint of a parallel program is taken in order to provide a consistent state of the program in the event the program is to be restarted. Each process of the parallel program is responsible for taking its own checkpoint, however, the timing of when the checkpoint is to be taken by each process is the responsibility of a coordinating process. During the checkpointing, various data is written to a checkpoint file. This data includes, for instance, in-transit message data, a data section, file offsets, signal state, executable information, stack contents and register contents. The checkpoint file can be stored either in local or global storage. When it is stored in global storage, migration of the program is facilitated. When a parallel program is to be restarted, each process of the program initiates its own restart. The restart logic restores the process to the state at which the checkpoint was taken.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.