3 Correctness of Distributed Databases
3.1 Serializability
Another strand of research is concerned with distributed databases and
in fact, this is very much related to the ``multiprogramming'' and ``semaphore''
strand described before.
But the kind of properties which have been studied are slightly
different, and the focus has been put on devising optimal algorithms for the
analysis of simple transaction models. As a matter of fact, a distributed
database can be seen as a shared-memory machine (containing items)
on which processes (called
transactions) act by reading and writing, getting permissions to do so by using
the appropriate functions on attached semaphores. One of the main purposes
of this area is to ensure coherence of the distributed database while ensuring
good performance, through a definition of suitable policies (protocols) for
transactions to perform their own actions (with P and V).
This entails that deadlock-freedom of transactions is of importance. Correctness
of a distributed database is itself very often expressed by some form of
a serializability condition. Look for instance at Figure 4.
This could describe a database with two transactions T1=Pb.Vb.Pa.Va and
T2=Pa.Va.Pb.Vb trying to modify
two items a and b. All paths of execution above the left hole are equivalent
to a serial execution of transaction T2 then transaction T1. All paths
of execution below the right hole are equivalent to the serial execution of
transaction T1 then T2. The third type of dipath is not a serial dipath:
it describes several equivalent
cases, for instance: T1 acquires b, T2 acquires a, then
T1 acquires a and T2 acquires b. Think of the database to represent
airplane tickets (for instance b is the return ticket corresponding
to the one-way ticket a),
and the two transactions to represent remote booking
booths, the action between a P and its corresponding V is writing a name
on the ticket. The situation here is that T1 will have reserved its one-way
ticket and T2 will have reserved its return ticket only. This is not an allowed
behaviour. It is not equivalent to a purely serial schedule which are the only
ones that are specified as correct (only one of T1 or T2 gets the whole
lot of tickets).
Testing serializability is unfortunately known to be a NP-complete
problem (in [Papadimitriou, 1979]), even when the model is only
based on simple binary semaphores.
3.2 The geometric approach
The progress graph approach to the study of distributed databases was really
initiated in [Yannakakis et al., 1979] and then
in [Papadimitriou, 1983].
In [Lipski and Papadimitriou, 1981] an algorithm for proving the safety through
serializability of distributed databases with only two transactions
expressed as progress graphs was described. The underlying algorithmics
is relying on proving the connectedness of the closure
of the set of forbidden rectangles8
.
Of course the real problem in our previous
example was that the forbidden region was disconnected, allowing dipaths to
interleave some of the requests of different transactions.
Another algorithm, for proving freedom from deadlock
for two transactions synchronizing with binary semaphores only was also described.
It was shown to have O(n log n log log n) (where
n is the number of forbidden rectangles) time complexity. A notion
of ``directed homotopy'' was also defined. The generalization of safety conditions
to higher-dimensions
through the method of [Yannakakis et al., 1979] reducing to 2-dimensional
problems,
was shown to be in O(n d 2d+d2 log n log log n) time complexity (d
is the dimension, i.e. the number of transactions).
Much work has been done in algorithmics of these geometric problems
and the algorithm above for safety is improved (actually it is optimal
then) in
[Soisalon-Soininen and Wood, 1985] achieving O(n log n) time and O(n) space complexity for
2 transactions, then relying on M. Yannakakis
result for the extension to any dimension.
This is not the best that can be done in higher dimensions: the next
step is achieved in [Fajstrup et al., 1998] where a
direct method for unsafe regions is described and where it is shown that the
closure of the forbidden region in higher-dimension is not a strong enough
condition for serializability in general. An application to proving the
2-phase locked protocol is given in [Gunawardena, 1994], and, using dihomotopy,
in [Fajstrup et al., 1999].