Patent · US Expired

Method and system for achieving collective consistency in detecting failures in a distributed computing system

US5682470A · kind A · utility

19Cited by
15References
47Claims
0Family size

Assignee

Inventors

Key dates

Filing dateSep 1, 1995
Grant dateOct 28, 1997
Priority date
Expiry dateSep 1, 2015

Classification

  • Technology area (CPC G)Physics
  • CPC primaryG06F11/1425
  • WIPO fieldComputer technology
  • WIPO sectorElectrical engineering

Abstract

A method and apparatus are disclosed for achieving collective consistency in the detection and reporting of failures in a distributed computing system having multiple processors. Each processor is capable of being called by a parallel application for system status. Initially, each processor sends the other processors its view on the status of the processors. It then waits for similar views from other processors except those regarded as failed in its own view. If the received views are identical to the view of the processor, the processor returns its view to the parallel application. In a preferred embodiment, if the views are not identical to its view, the processor sets its view to the union of the received views and its current view. The steps are then repeated. Alternately, the steps are repeated if the processor does not have information that each of the processors not regarded as failed in its view forms an identical union view. In another preferred embodiment, the method is terminated if a quorum is not formed by the processors which are not regarded as failed. Alternatively, after sending its view, the processor waits for an exit condition. Depending on the exit condition, t…

Source: USPTO / EPO open patent data. Objective bibliographic and citation counts.