datatype T = C1 of T1 | C2 of T2 | ... | Cn of Tnin which T is the new type being defined, the Ci are the new constructors being introduced, and the Ti are the types of the arguments supplied to the constructors.
datatype direction = north | south | east | westNote the similarity of this syntax with the syntax for context free grammars. This similarity is no coincidence. If you understand grammars, you can make sense of datatype declarations. The effect of the above declaration is to introduce the following constants:
north : direction south : direction east : direction west : directionThe order makes no difference. (There's no notion of sucessor or ordinal value for constants).
Having new data objects is nice, but they are useless unless we can define operations on them. (What good are integers unless you have operations and can define functions which manipulate them?) Fortunately, pattern matching can be used on constructors introduced via datatype declarations. Here is a simple function which takes a direction and returns the direction 90 degrees, clockwise, from that direction:
fun rotate north = east |rotate east = south |rotate south = west |rotate west = northAs you might suspect, the type of this function is direction -> direction.
Constructors also admit an equality operation, so we can compare two expressions of type direction. For example, we could define rotate in the following, more cumbersome, way:
fun rotate d = if d=north then east else if d=east then south else if d=south then west else northSince the datatype declaration is a declaration, we can define local datatypes:
local datatype color = red | white | blue in ... endBoth the type name color and the constants red, white, and blue are only visible in the "..." region.
datatype year = freshman | sophmore | junior | senior; datatype rank = assistant | associate | full; datatype person = student of (string * year) | faculty of (string * rank) | staff of (string * int);The first two declarations introduce enumerated types. The third introduces the type person and three constructors:
student : (string * year) -> person faculty : (string * rank) -> person staff : (string * int) -> personUsing standard typing rules, we can see how to build up objects of type person. For example, the following are all correct:
student("Sally",junior) faculty("John", associate) staff("Suzie", 10)The arguments to constructors can be arbitrary expression, as long as they have the correct type. So the following are also valid:
let val x = "Sally" in student(x, if true then junior else senior) end let val p = ("John"^"Hannan",associate) in faculty p end(The infix operator "^" is the string concatenation operator.)
We can again define functions which use pattern matching to destruct data objects into its subparts. In fact, this is the only way to get at the subparts of a structured data object. (Remember lists? What is the only way to access the head and tail of a list?)
fun name (student(N,_)) = N |name (staff(N,_)) = N |name (faculty(N,_)) = N; fun norespect (faculty(_,assistant)) = true |norespect (student(_,Y)) = (Y=freshman orelse Y=sophmore) |norespect (staff(_,R)) = (R<5) |norespect _ = falseThe function norespect is of type person -> bool and returns true if the person is an assistant professor, a freshman or sophmore, or a staff member of grade less than 5.
Note how this datatype is a kind of variant record or structure,
except that instead of explicit field names, we simply have subterms.
Instead of defining a record for faculty with field names name
and rank, we simply define a constructor which takes
a (name,rank) pair. Of course, it is up to the programmer
to remember the type (and intention) of the argument to the constructor.
(SML does have a distinct record datatype, but we will not be studying
So far, we really haven't seen any great advantage of SML's datatypes
when compared to the datatypes of other languages (though I would argue
that the use of pattern matching provides an elegant means for writing
functions over datatypes). A distinguishing feature of SML's concrete
datatypes is that they can be recursively (inductively) defined.
Consider the following inductive definition of nonempty binary trees in which the leaves contain integers:
datatype tree = Leaf of int | Node of (tree * tree);That's it! Note that this is a recursive definition because the datatype is defined in terms of itself. We again use standard typing rules to see how to form data objects of type tree:
Leaf 5 : tree Node(Leaf 3, Leaf 1) : tree Node(Node(Leaf 3, Leaf 1), Leaf 4) : tree let val t1 = Node(Leaf 3, Leaf 1) in Node(t1,t1) end : treeAgain, we define functions over the new datatype via pattern matching. Consider the following function for computing the maximum depth of a tree:
fun maxdepth (Leaf _) = 1 |maxdepth (Node(T1,T2)) = max(maxdepth T1, maxdepth T2) + 1;(where max is the function which returns the maximum of two integers). Is this simple or what? This is just how you would define this function if you were to describe it in written language:
The maxdepth of a leaf is 1; the maxdepth of a node is one greater than the maximum of the maxdepth of the two subtrees.It just doesn't get any simpler. No pointer nonsense, no assignment to keep track of a maximum value, etc. Isn't SML wonderful?
Not convinced? OK, more examples. Let's generalize the above definition of trees to allow integers also at internal nodes:
datatype tree = Leaf of int | Node of (tree * int * tree);The choice of putting the int argument between the two tree arguments is arbitrary, but we need to remember it. (We could have put it first: (int * tree * tree).) Here's the function which takes a tree and returns the list of integers representing the left-to-right inorder traversal of the tree:
fun inorder (Leaf N) = N::nil |inorder (Node(T1,N,T2)) = (inorder T1) @ (N::nil) @ (inorder T2)(The infix operator "@" is SML's built-in append function.) Again, is this simple or what? Don't you wish you had used SML in CSE 465? It's trivial to modify this function to obtain pre-order, post-order, right-to-left, you-name-it.
The syntax for this is the following:
datatype 'a tree = Leaf of 'a | Node of (('a tree) * 'a * ('a tree));We explicitly include a type variable in the declaration. You can think of tree as a type constructor which takes any type (say t) and returns a new type (t tree). This notation is similar to the postfix notation for lists.
Note that the integer trees above are simply an instance of this type. But now we can also construct (and manipulate) trees contain reals, or booleans, or ... Any tree, however, can only contain one type of element. No mixing types in the trees. The type declaration clearly enforces this. If we instantiate 'a above to bool, then Node is a constructor which takes an argument of type bool tree * bool * bool tree. So it's two subtrees are also bool trees.
The function inorder defined above still works with this new, polymorphic definition of trees, only now it's type is 'a tree -> 'a list. (You must reload the definition of inorder after redefining the tree datatype.) We could define a function which assumes the nodes and leaves contain integers, for example:
fun sumtree (Leaf n) = n |sumtree (Node(t1,n,t2)) = sumtree t1 + n + sumtree t2;This function only works on int trees, so it's type is int tree -> int.
We can have more than one type variable in the declaration of a polymorphic datatype:
datatype ('a,'b) tree = Leaf of 'b | Node of ((('a,'b) tree) * 'a * (('a,'b) tree));This defines binary trees whose leaves containing terms of type 'b and nodes containing terms of type 'a. How would you describe the following definition of trees?
datatype ('a,'b) tree = Leaf of 'b | Node of ((('b,'a) tree) * 'a * (('b,'a) tree));This is a legal definition and actualy useful in some settings.