Patent · US Active

Mechanism to provide reliability through packet drop detection

US7877436B2 · kind B2 · utility

15Cited by
4References
19Claims
0Family size

Assignee

Inventors

Key dates

Filing dateFeb 1, 2008
Grant dateJan 25, 2011
Priority date
Expiry dateMar 6, 2029

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F9/544
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method and a data processing system for completing checkpoint processing of a distributed job with local tasks communicating with other remote tasks via a host fabric interface (HFI) and assigned HFI window. Each HFI window has a send count and a receive count, which tracks GSM messages that are sent from and received at the HFI window. When a checkpoint is initiated by a master task, each local task forwards the send count and the receive count to the master task. The master task sums the respective counts and then compares the totals to each other. When the send count total is equal to the receive count total, the tasks are permitted to continue processing. However, when the send count total is not equal to the receive count total, the master task notifies each task of the job to rollback to a previous checkpoint or kill the job execution.

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.