Mechanism to provide reliability through packet drop detection
US7877436B2 · kind B2 · utility
Assignee
Inventors
Key dates
| Filing date | Feb 1, 2008 |
| Grant date | Jan 25, 2011 |
| Priority date | — |
| Expiry date | Mar 6, 2029 |
Classification
- Technology area (CPC G)Physics
- CPC primaryG06F9/544
- WIPO fieldComputer technology
- WIPO sectorElectrical engineering
Abstract
A method and a data processing system for completing checkpoint processing of a distributed job with local tasks communicating with other remote tasks via a host fabric interface (HFI) and assigned HFI window. Each HFI window has a send count and a receive count, which tracks GSM messages that are sent from and received at the HFI window. When a checkpoint is initiated by a master task, each local task forwards the send count and the receive count to the master task. The master task sums the respective counts and then compares the totals to each other. When the send count total is equal to the receive count total, the tasks are permitted to continue processing. However, when the send count total is not equal to the receive count total, the master task notifies each task of the job to rollback to a previous checkpoint or kill the job execution.
Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.