System of performing checkpoint/restart of a parallel program
US6401216B1 · kind B1 · utility
Assignee
Inventors
Key dates
| Filing date | Oct 29, 1998 |
| Grant date | Jun 4, 2002 |
| Priority date | — |
| Expiry date | Oct 29, 2018 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F11/1458
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A checkpoint of a parallel program is taken in order to provide a consistent state of the program in the event the program is to be restarted. Each process of the parallel program is responsible for taking its own checkpoint, however, the timing of when the checkpoint is to be taken by each process is the responsibility of a coordinating process. During the checkpointing, various data is written to a checkpoint file. This data includes, for instance, in-transit message data, a data section, file offsets, signal state, executable information, stack contents and register contents. The checkpoint file can be stored either in local or global storage. When it is stored in global storage, migration of the program is facilitated. When a parallel program is to be restarted, each process of the program initiates its own restart. The restart logic restores the process to the state at which the checkpoint was taken.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.