CSE 428: Lecture 10

Abstract Data Types

All modern high-level languages allow mechanisms of data-abstraction, i.e. mechanisms for enriching the language with new types and operation on them. "Abstraction" here means that it is possible to define new types and operations in such a way that the user (a programmer using the new types) might use them just as he would use the other (primitive) data types of the language.

It is beneficial to separate the concepts of "specification" and "implementation" of an ADT.

The specification is the abstract description of the type and the behaviour of the operations. It should be implementation-independent and even language-independent. It should give the user all the information necessary to use the ADT, but no more.
The implementation, on the other hand, is the concrete representation of the elements of the new type in terms of existing types, and the definition of the operations as functios or procedures. The implementation should of course satisfy the specification, i.e. the behaviour of the operations, their types, etc. should be those prescribed by the specification.

The user should use the ADT only in the ways allowed by the specification. In this way, even if the implementation changes, his programs would not need to be modified. Furthermore, he can use the properties of the specification (if provided) to reason about the correctness of the programs.

Some languages, like Pascal Standard and C, allow the definition of new types, but do not provide any mechanism for data protection. Namely, there is no way to "shelter" the implementation, i.e. to forbid the user to access the ADT via the operations allowed on the implementation. This practice, of course, violates the principle of the ADT and nullifies the advantages mentioned above. More modern languages, like Modula 2 and C++, have introduced mechanisms for hiding the implementation and making it externally inaccessible (See the book of Sethi, Ch. 6).

We illustrate the concept of ADT by showing the specification and implementation of the type "simple list of integers" in a Pascal-like language.

Specification

In an abstract sense, a list is a sequence of "nodes", where each node contains an information (an integer number in this case). "Simple" here means that the only operations of the ADT are:

the creation of the empty list, i.e. the list with 0 nodes (emptylist),
the test whether a list is empty (is_empty),
the addition of a new node in front of a list (cons),
the access to the information in the first node (head),
the list obtained by removing the first node (tail).

We now specify more precisely the type of each operation (interface). We use the following notation:

f: () -> T means that f has 0 arguments and result of type T.
f: (T1) -> T2 means that f has 1 argument of type T1 and result of type T2.
f: (T1) -> T2 means that f has 2 arguments, of type T1 and T2 respectively, and result of type T.
... etc.

Having in mind this notation, we define the type of the operations on simple lists as follows:

emptylist: () -> list
is_empty: (list) -> boolean
cons: (integer,list) -> list
head: (list) -> integer
tail: (list) -> list

The above is the "abstract definition" of symple lists, i.e. the specification of the ADT. Note that we don't say anything about the implementation here.

We might add to this specification the property that lists are non-circular structures, i.e. should contain no loop.

Actually, real specifications of ADT's should give the specification of the algebra, i.e. the semantics of the operations, in a more detailed and formal way than our abstract description.. Furthermore, a good specification should include any property that might be relevant for the programmer. However, we won't go in details about formal methods for ADT specifications, since it would be out of the scope of this course.

Implementation

As mentioned above, the implementation of an ADT consists of two parts:

concrete representation of the elements of the new type in terms of existing types,
definition of the operations as functions or procedures

Representation of lists

One possible approach to the concrete representation of lists is by using records and pointers. In this approach, we need to maintain in a node not only the information, but also the pointer to the next node in the sequence. We will call the concrete counterpart of nodes "elements" to avoid confusion. Thus an element will be a record, with a field "info" of type integer, and a field "next" of type pointer to element. A list will then be just a pointer to the first element of the sequence. The definition of the type list, in a Pascal-like language, would then be:

   type list = ^element;
        element = record
                    info : integer;
                    next = list
                  end;

Definition of the operations

   function emptylist : list;
      begin 
      emptylist := nil
      end;


   function is_empty(L:list): boolean;
      begin
      if L = nil then is_empty := true else is_empty := false
      end;


   function cons(x:integer, L:list): list;
      var aux : list;
      begin
      new(aux);
      aux^.info := x; 
      aux^.next := L;
      cons := aux
      end;


   function head(L:list): integer;
      begin
      if L = nil 
         then head := 0 /* error */
         else head := L^.info
      end; 


   function tail(L:list): list;
      begin
      if L = nil 
         then tail := nil /* error */
         else begin
              tail := L^.next;
              dispose(L)  /* We might want to eliminate this instruction */ 
              end         /* dispose(L), because it causes side effects  */
      end;                /* on other lists sharing L.                   */

Note that the property that lists are non-circular structures is satisfied by this implementation: it is not possible to create circular lists by using only the functions above.

Using the ADT "simple list"

A correct use of the ADT should use only the operations of the interface, i.e. emptylist, isempty, cons, head and tail. For instance, consider below two possible definitions of the append function: The first respect this principle, the second doesn't. Of course, if the language offered a mechanism to protect the ADT, the second could not even be written.

"Good" append

   function append(L1,L2:list): list
      begin
      if is_empty(L1)
         then append := L2
         else append := cons(head(L1),append(tail(L1),L2))
      end;

"Bad" append

   function append(L1,L2:list): list
      begin
      if L1 = nil 
         then append := L2
         else append := cons(L1^.info,append(L1^.next, L2))
      end;

Note that the second definition is not implementation-independent: if we change the implementation of lists, then the second definition will not be valid anymore. The first one, on the contrary, will still be valid, provided of course that the new implementation respects the specification.