4 Computability of Fault-Tolerant Distributed
4.1 ResilienceThe strand of work here is concerned with the robust or
fault-tolerant implementation of distributed programs. More
precisely, the interest here is in wait-free or t-resilient
(0 £ t £ n-1 where n is the number of processors involved,
wait-free being n-1-resilient)
on a distributed machine composed of several units communicating through a
shared memory via atomic read/write or FIFO queues etc. or through
synchronous/asynchronous message passing. This means that
the processes executed on the n processors
must be as loosely coupled as possible so that even if t processors fail to
terminate, the others will carry on their computation and find a correct
partial result (as observed on the non-faulty ones).
Consider as an example a machine with two processors, P1 and P2,
communicating by atomic reads and writes on a shared memory. Each of
these processes has a local binary variable x1 (for P1) and x2
(for P2) respectively. The binary consensus problem is to design an
algorithm for having P1 and P2 agree on one value, which, moreover, has
to be a value that
one of them started with. Therefore if x1=0 and x2=1 at the beginning,
we want them to be x1=0 and x2=0 or x1=1 and x2=1 at the end.
Of course this is complicated by the fact that we ask the solution to be
wait-free. If one of the two processes fails (the other process being unable
to know whether this process is dead or just very slow) then the non-faulty
one must terminate with one of the values P1 or P2 started with
It is unfortunately proved in [Fisher et al., 1985] that this
is not possible -- in fact it was proven for the equivalent case
of an asynchronous message-passing
system, in the general 1-resilient case.
The article used an argument from graph theory (but already
quite geometric in nature). It has in fact originated the geometric point
of view on this field of research as we are going to
4.2 From graph to simplicial setsThe main argument can be understood in broad terms
(``similarity chain'') as a connectedness
result. If we represent the local states of each processor Pi
(i=1,2), at some point
of the execution of a program, by a vertex (Pi,xi) and the global
states of the system by an (un-oriented) edge linking the two local states
which compose them, then it is easy to see that the semantics of atomic reads
and writes imposes that the reachable global states starting from one initial
global state have to form a connected graph. There is no way a program on
such a machine can ``break'' connectivity. This implies that the binary
consensus cannot be implemented here since we must be able to reach from the
global state ((P1,0),(P2,1))
two disconnected global states ((P1,0),(P2,0))
and ((P1,1),(P2,1)) as shown on Figure 7.
Figure 7: The binary consensus decision map from initial global states
to final global states
Further work generalizes this simple geometric argument to the needed
In [Biran et al., 1988] a characterization of a class of problems
solvable in asynchronous message-passing systems in the presence of
a single failure was given. No generalization to more failures has since been
solved using the same kinds of graph techniques.
It was then a rather shared belief that one would have to use more
powerful techniques in that case.
The conjecture [Chaudhuri, 1990] that the k-set agreement problem
(a kind of weak consensus;
all is required is that the non-faulty processes eventually agree on a subset
of at most k values taken from the input values) cannot
be solved in certain asynchronous systems was finally proven
in three different papers independently, [Borowsky and Gafni, 1993],
[Saks and Zaharoglou, 1993] and [Herlihy and Shavit, 1993].
Figure 8: The synchronous protocol complex (for 3 processes) after one round
The basic idea of [Herlihy and Shavit, 1993] is to generalize such pictures
as the one of Figure 7 using simplicial sets instead
of graphs (which are special cases of the former). Simplicial sets
are made up of vertices, edges but also triangles, simplexes in general glued
altogether. A simplex of dimension n represents the global state, at some
point of the execution, of n processes. Vertices are still pairs composed of
the name of the process, together with its local state. Again, given an
initial global state, the semantics of the operations of the distributed machine
we want to study is defined by the reachable global states, at any time. For instance,
Figure 8 shows the simplicial set, called protocol complex, after
one round of communication on a synchronous message-passing machine, which broadcasts
the local states to all processes at each step. If there is a simplicial map from
it to some suitable set of global states, respecting the specification of the
problem, then there exists a corresponding wait-free protocol. This enables us
also to compute how many rounds of communication might be necessary to solve a given
problem. Notice that the simplexes of the protocol complex are really the
schedules, and this should be related to the directed homotopy approach of
Section 2. This has been
hinted in [Goubault, 1996a], [Goubault, 1996b], [Goubault, 1997] for
simple cases only.
4.3 Some resultsIt was also proved in
[Herlihy and Shavit, 1993] that in a shared-memory model with single
reader/single writer registers providing atomic read and write
operations, k-set agreement
requires at least ë f/k û +
1 rounds where f is the number of processes that can fail.
The renaming task (processes must try to agree
on a smaller set of names between each other than the original set of names),
proposed in [Attiya et al., 1990] was also finally
solved in [Herlihy and Shavit, 1993]. There is a wait-free protocol for the
renaming task in certain asynchronous systems if the output name space is
sufficiently large. It was already known that there is a wait-free solution
for the renaming task for 2n+1 or more output names on a system of
n+1 asynchronous processors and none for n+2 or fewer output names. M. Herlihy
and N. Shavit refined this result and showed that there was no solution for
strictly less that 2n+1 output names.
impossibility results can be given but also constructive means for
finding algorithms follow from this work (for instance
[Herlihy and Shavit, 1994]).
More generally, datatypes do matter for computability results. It is known
for quite a long time that consensus for two processes can be solved with shared memory with
atomic reads and writes plus a FIFO queue,
or plus an atomic test&set operation.
If we define the consensus number of a data type as the maximal
number of asynchronous processors (having atomic read and write) on
which it can implement wait-free consensus, then,
These facts motivated the introduction of the following general
problem, dealing with
the power of the architecture of distributed machines. We say that a
datatype, or object, is an (m,j)-consensus object if it allows any set of m processes
to solve j-set agreement tasks. In [Herlihy and Rajsbaum, 1994], Herlihy and Rajsbaum (see
also [Borowsky and Gafni, 1993]) proved that is is impossible to implement
(n+1,k)-consensus using (m,j)-consensus objects if n/k > m/j.
- atomic read/write registers have consensus number 1,
- test&set and fetch&add registers, queues, and stacks have consensus
- n-register assignment has consensus number 2n-2,
- load-locked, store-conditional and compare-and-swap registers have
consensus number ¥.
Further work on this can be found in
[Jayanti, 1993], [Jayanti, 1997] and
[Schenk, 1997]. In particular, the problem identified in
[Jayanti, 1997] is that the hierarchy of data objects as briefly sketched
above is not robust, in the sense that it is possible to implement some
datatypes with consensus number k using several datatypes with consensus
numbers strictly less than k.
This of course is due to some subtle interactions between the use of these
data objects and we could hope that the more general study of the schedules
with such datatypes could lead to some better classifications.
In ``Algebraic Spans'' in a forthcoming issue of Mathematical Structures in
Computer Science, M. Herlihy and S. Rajsbaum
introduce a new tool (related to the one described in [Herlihy and Rajsbaum, 1995]
for proving impossibility results,
based on a core theorem of algebraic topology, the acyclic carrier theorem,
generalizes and extends earlier results.