CSE 428: Concrete, recursive and polymorphic types.


Concrete Datatypes

So far we have seen some of SML's built-in datatypes, including base types, pairs, and the inductively defined list datatypes. As in any practical language, SML supports user defined datatypes. The most common form are the concrete datatypes. You should think of these as corresponding to the enumerated and record types found in other languages, but SML types are more expressive.

Introduction

Concrete datatypes (as opposed to Abstract Datatypes) introduce a new type name and a set of constructors for building up data objects of the new type. The general syntax for datatype declarations is
datatype T = C1 of T1 | C2 of T2 | ... | Cn of Tn
in which T is the new type being defined, the Ci are the new constructors being introduced, and the Ti are the types of the arguments supplied to the constructors.

Enumerated Types

To begin, we give an example of an enumerated type. An enumerated type defines a set of constants, all of which have the same type, which is also introduced in the declaration. Consider the following declaration for directions:
datatype direction = north | south | east | west
Note the similarity of this syntax with the syntax for context free grammars. This similarity is no coincidence. If you understand grammars, you can make sense of datatype declarations. The effect of the above declaration is to introduce the following constants:
north : direction
south : direction
east  : direction
west  : direction
The order makes no difference. (There's no notion of sucessor or ordinal value for constants).

Having new data objects is nice, but they are useless unless we can define operations on them. (What good are integers unless you have operations and can define functions which manipulate them?) Fortunately, pattern matching can be used on constructors introduced via datatype declarations. Here is a simple function which takes a direction and returns the direction 90 degrees, clockwise, from that direction:

fun rotate north = east
   |rotate east = south
   |rotate south = west
   |rotate west = north
As you might suspect, the type of this function is direction -> direction.

Constructors also admit an equality operation, so we can compare two expressions of type direction. For example, we could define rotate in the following, more cumbersome, way:

fun rotate d = if d=north then east
               else if d=east then south
               else if d=south then west
               else north
Since the datatype declaration is a declaration, we can define local datatypes:
local
 datatype color = red | white | blue
in 
  ...
end
Both the type name color and the constants red, white, and blue are only visible in the "..." region.

DataTypes which contain Data

Instead of simply defining constants, we can define constructors which take an argument. Consider the following example
datatype year = freshman | sophmore | junior | senior;
datatype rank = assistant | associate | full;
datatype person = student of (string * year) | faculty of (string * rank)
                | staff of (string * int);
The first two declarations introduce enumerated types. The third introduces the type person and three constructors:
student : (string * year) -> person
faculty : (string * rank) -> person
staff   : (string * int)  -> person
Using standard typing rules, we can see how to build up objects of type person. For example, the following are all correct:
student("Sally",junior)
faculty("John", associate)
staff("Suzie", 10)
The arguments to constructors can be arbitrary expression, as long as they have the correct type. So the following are also valid:
let val x = "Sally" in student(x, if true then junior else senior) end
let val p = ("John"^"Hannan",associate) in faculty p end
(The infix operator "^" is the string concatenation operator.)

We can again define functions which use pattern matching to destruct data objects into its subparts. In fact, this is the only way to get at the subparts of a structured data object. (Remember lists? What is the only way to access the head and tail of a list?)

fun name (student(N,_)) = N
   |name (staff(N,_)) = N
   |name (faculty(N,_)) = N;
fun norespect (faculty(_,assistant)) = true
   |norespect (student(_,Y)) = (Y=freshman orelse Y=sophmore)
   |norespect (staff(_,R)) = (R<5)
   |norespect _ = false
The function norespect is of type person -> bool and returns true if the person is an assistant professor, a freshman or sophmore, or a staff member of grade less than 5.

Note how this datatype is a kind of variant record or structure, except that instead of explicit field names, we simply have subterms. Instead of defining a record for faculty with field names name and rank, we simply define a constructor which takes a (name,rank) pair. Of course, it is up to the programmer to remember the type (and intention) of the argument to the constructor. (SML does have a distinct record datatype, but we will not be studying it.)

Recursive DataTypes

So far, we really haven't seen any great advantage of SML's datatypes when compared to the datatypes of other languages (though I would argue that the use of pattern matching provides an elegant means for writing functions over datatypes). A distinguishing feature of SML's concrete datatypes is that they can be recursively (inductively) defined.

Consider the following inductive definition of nonempty binary trees in which the leaves contain integers:

The terms Leaf and Node act as labels or constructors. To define such a datatype in C or Pascal we would use pointers and define a record with a field which contains a pointer to the tree datatype. This is a rather inelegant, though efficient, solution to providing recursive datatypes. SML provides an elegant, direct way of defining it:
datatype tree = Leaf of int | Node of (tree * tree);
That's it! Note that this is a recursive definition because the datatype is defined in terms of itself. We again use standard typing rules to see how to form data objects of type tree:
Leaf 5  : tree
Node(Leaf 3, Leaf 1)  : tree
Node(Node(Leaf 3, Leaf 1), Leaf 4)  : tree
let val t1 = Node(Leaf 3, Leaf 1) in Node(t1,t1) end   : tree
Again, we define functions over the new datatype via pattern matching. Consider the following function for computing the maximum depth of a tree:
fun maxdepth (Leaf _) = 1
   |maxdepth (Node(T1,T2)) = max(maxdepth T1, maxdepth T2) + 1;
(where max is the function which returns the maximum of two integers). Is this simple or what? This is just how you would define this function if you were to describe it in written language:
The maxdepth of a leaf is 1; the maxdepth of a node is one greater than the maximum of the maxdepth of the two subtrees.
It just doesn't get any simpler. No pointer nonsense, no assignment to keep track of a maximum value, etc. Isn't SML wonderful?

Not convinced? OK, more examples. Let's generalize the above definition of trees to allow integers also at internal nodes:

datatype tree = Leaf of int | Node of (tree * int * tree);
The choice of putting the int argument between the two tree arguments is arbitrary, but we need to remember it. (We could have put it first: (int * tree * tree).) Here's the function which takes a tree and returns the list of integers representing the left-to-right inorder traversal of the tree:
fun inorder (Leaf N) = N::nil
   |inorder (Node(T1,N,T2)) = (inorder T1) @ (N::nil) @ (inorder T2)
(The infix operator "@" is SML's built-in append function.) Again, is this simple or what? Don't you wish you had used SML in CSE 465? It's trivial to modify this function to obtain pre-order, post-order, right-to-left, you-name-it.

Polymorphic DataTypes

The general form of concrete datatype declarations supports polymorphic (or parametric) datatypes. As a simple example, recall that the previous definition of binary trees had a fixed type (integer) of expressions/values stored in the nodes and leaves of the tree. We might call these trees integer binary trees. We would like to define 'a binary trees: trees whoses nodes and leaves contain expressions/values of type 'a.

The syntax for this is the following:

datatype 'a tree = Leaf of 'a | Node of (('a tree) * 'a * ('a tree));
We explicitly include a type variable in the declaration. You can think of tree as a type constructor which takes any type (say t) and returns a new type (t tree). This notation is similar to the postfix notation for lists.

Note that the integer trees above are simply an instance of this type. But now we can also construct (and manipulate) trees contain reals, or booleans, or ... Any tree, however, can only contain one type of element. No mixing types in the trees. The type declaration clearly enforces this. If we instantiate 'a above to bool, then Node is a constructor which takes an argument of type bool tree * bool * bool tree. So it's two subtrees are also bool trees.

The function inorder defined above still works with this new, polymorphic definition of trees, only now it's type is 'a tree -> 'a list. (You must reload the definition of inorder after redefining the tree datatype.) We could define a function which assumes the nodes and leaves contain integers, for example:

fun sumtree (Leaf n) = n
   |sumtree (Node(t1,n,t2)) = sumtree t1 + n + sumtree t2;
This function only works on int trees, so it's type is int tree -> int.

We can have more than one type variable in the declaration of a polymorphic datatype:

datatype ('a,'b) tree = Leaf of 'b 
                      | Node of ((('a,'b) tree) * 'a * (('a,'b) tree));
This defines binary trees whose leaves containing terms of type 'b and nodes containing terms of type 'a. How would you describe the following definition of trees?
datatype ('a,'b) tree = Leaf of 'b 
                      | Node of ((('b,'a) tree) * 'a * (('b,'a) tree));
This is a legal definition and actualy useful in some settings.