Patent · US Active

Checkpointing using compute node health information

US10545839B2 · kind B2 · utility

0Cited by
10References
20Claims
0Family size

Assignee

Inventors

Key dates

Filing dateDec 22, 2017
Grant dateJan 28, 2020
Priority date
Expiry dateApr 12, 2038

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F2201/82
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method is disclosed, as well as an associated apparatus and computer program product, for checkpointing using a plurality of communicatively coupled compute nodes. The method comprises acquiring health information for a first node of the plurality of compute nodes, and determining a first failure probability for the first node using the health information. The first failure probability corresponds to a predetermined time interval. The method further comprises selecting a second node of the plurality of compute nodes as a partner node for the first node. The second node has a second failure probability for the time interval. A composite failure probability of the first node and the second node is less than the first failure probability. The method further comprises copying checkpoint information from the first node to the partner node.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.