Correctness of Distributed Databases

3 Correctness of Distributed Databases

3.1 Serializability

Another strand of research is concerned with distributed databases and in fact, this is very much related to the ``multiprogramming'' and ``semaphore'' strand described before. But the kind of properties which have been studied are slightly different, and the focus has been put on devising optimal algorithms for the analysis of simple transaction models. As a matter of fact, a distributed database can be seen as a shared-memory machine (containing items) on which processes (called transactions) act by reading and writing, getting permissions to do so by using the appropriate functions on attached semaphores. One of the main purposes of this area is to ensure coherence of the distributed database while ensuring good performance, through a definition of suitable policies (protocols) for transactions to perform their own actions (with P and V). This entails that deadlock-freedom of transactions is of importance. Correctness of a distributed database is itself very often expressed by some form of a serializability condition. Look for instance at Figure 4. This could describe a database with two transactions T₁=Pb.Vb.Pa.Va and T₂=Pa.Va.Pb.Vb trying to modify two items a and b. All paths of execution above the left hole are equivalent to a serial execution of transaction T₂ then transaction T₁. All paths of execution below the right hole are equivalent to the serial execution of transaction T₁ then T₂. The third type of dipath is not a serial dipath: it describes several equivalent cases, for instance: T₁ acquires b, T₂ acquires a, then T₁ acquires a and T₂ acquires b. Think of the database to represent airplane tickets (for instance b is the return ticket corresponding to the one-way ticket a), and the two transactions to represent remote booking booths, the action between a P and its corresponding V is writing a name on the ticket. The situation here is that T₁ will have reserved its one-way ticket and T₂ will have reserved its return ticket only. This is not an allowed behaviour. It is not equivalent to a purely serial schedule which are the only ones that are specified as correct (only one of T₁ or T₂ gets the whole lot of tickets).

Testing serializability is unfortunately known to be a NP-complete problem (in [Papadimitriou, 1979]), even when the model is only based on simple binary semaphores.

3.2 The geometric approach

The progress graph approach to the study of distributed databases was really initiated in [Yannakakis et al., 1979] and then in [Papadimitriou, 1983]. In [Lipski and Papadimitriou, 1981] an algorithm for proving the safety through serializability of distributed databases with only two transactions expressed as progress graphs was described. The underlying algorithmics is relying on proving the connectedness of the closure of the set of forbidden rectangles ⁸ . Of course the real problem in our previous example was that the forbidden region was disconnected, allowing dipaths to interleave some of the requests of different transactions. Another algorithm, for proving freedom from deadlock for two transactions synchronizing with binary semaphores only was also described. It was shown to have O(n log n log log n) (where n is the number of forbidden rectangles) time complexity. A notion of ``directed homotopy'' was also defined. The generalization of safety conditions to higher-dimensions through the method of [Yannakakis et al., 1979] reducing to 2-dimensional problems, was shown to be in O(n d 2^d+d² log n log log n) time complexity (d is the dimension, i.e. the number of transactions). Much work has been done in algorithmics of these geometric problems and the algorithm above for safety is improved (actually it is optimal then) in [Soisalon-Soininen and Wood, 1985] achieving O(n log n) time and O(n) space complexity for 2 transactions, then relying on M. Yannakakis result for the extension to any dimension. This is not the best that can be done in higher dimensions: the next step is achieved in [Fajstrup et al., 1998] where a direct method for unsafe regions is described and where it is shown that the closure of the forbidden region in higher-dimension is not a strong enough condition for serializability in general. An application to proving the 2-phase locked protocol is given in [Gunawardena, 1994], and, using dihomotopy, in [Fajstrup et al., 1999].