Checkpointing using compute node health information
US10545839B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Dec 22, 2017 |
| Grant date | Jan 28, 2020 |
| Priority date | — |
| Expiry date | Apr 12, 2038 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F2201/82
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method is disclosed, as well as an associated apparatus and computer program product, for checkpointing using a plurality of communicatively coupled compute nodes. The method comprises acquiring health information for a first node of the plurality of compute nodes, and determining a first failure probability for the first node using the health information. The first failure probability corresponds to a predetermined time interval. The method further comprises selecting a second node of the plurality of compute nodes as a partner node for the first node. The second node has a second failure probability for the time interval. A composite failure probability of the first node and the second node is less than the first failure probability. The method further comprises copying checkpoint information from the first node to the partner node.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.