CSE301

Lab 3 (higher-order functions and type classes)

General instructions

Importing from the Standard Library

It will be convenient to import the module Data.List:

import Data.List

A few functions from this module are mentioned in the exercises below. If you’re curious about what else this module exports you can refer to hackage, or you can get an offline description by typing :browse Data.List into the GHC interpreter.

Zipping exercises

--- Zipping exercises

The function zip takes two lists and produces a list of pairs, by pairing off successive elements of both lists until one or both are exhausted. It can be defined as follows by recursion and pattern-matching:

zip :: [a] -> [b] -> [(a,b)]
zip []     _      = []
zip _      []     = []
zip (x:xs) (y:ys) = (x,y):zip xs ys

For example, zip [0..4] "hello" evaluates to [(0,'h'),(1,'e'),(2,'l'),(3,'l'),(4,'o')], as does zip [0..9] "hello".

There is also a higher-order version of zipping called zipWith, which takes a function f :: a -> b -> c as an extra argument and uses that to produce a list of cs instead of a list of (a,b)s. It can be defined as follows by recursion and pattern-matching:

zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
zipWith f []     _      = []
zipWith f _      []     = []
zipWith f (x:xs) (y:ys) = f x y:zipWith f xs ys

For example, sum (zipWith (*) xs ys) computes the dot product of two lists of numbers xs and ys.

As a warmup exercise, express zip in terms of zipWith:

-- Exercise 1a
my_zip :: [a] -> [b] -> [(a,b)]
my_zip = undefined

Now express zipWith in terms of map and zip, and possibly some other higher-order functions, but not using any recursion:

-- Exercise 1b
my_zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
my_zipWith = undefined

The module Data.List includes a function transpose :: [[a]] -> [[a]] that transposes the rows and the columns of a list of lists, viewed as a two-dimensional matrix. For example, transpose [[1,2,3],[4,5,6],[7,8,9],[0,1,2]] evaluates to [[1,4,7,0],[2,5,8,1],[3,6,9,2]]. The transpose operation may be seen as a kind of n-ary zip, where we zip the rows together to form the columns.

Implement transpose for yourself below. For this question you are allowed to use recursion, and you may assume that the lengths of all the lists in the input list of lists are equal.

-- Exercise 1c (optional)
my_transpose :: [[a]] -> [[a]]
my_transpose = undefined

Hint: if you get stuck you can always look at the source for the transpose function on hackage, but you should try to implement it for yourself first! Note that the version in the standard library does not make the assumption that the lengths of all the lists in the input list of lists are equal, skipping over a row once it has run out of elements, but you do not need to implement this functionality.

Folding exercises

--- Folding exercises

As discussed in class and in the lecture notes, the higher-order function foldr is a kind of jack-of-all-trades of list functions. Let’s recall the definition here:

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f v []     = v
foldr f v (x:xs) = f x (foldr f v xs)

For example, the function sum :: Num a => [a] -> a computing the sum of a list of numbers can be defined by sum = foldr (+) 0.

For the following exercises, you need to express the given functions using foldr, without using any recursion. A good strategy is to first figure out what should be the types of the arguments f and v to pass to foldr.

-- Exercise 2a
altsum :: Num a => [a] -> a
altsum = undefined

-- Exercise 2b
my_intersperse :: a -> [a] -> [a]
my_intersperse = undefined

-- Exercise 2c
my_tails :: [a] -> [[a]]
my_tails = undefined

-- Exercise 2d (optional)
my_isPrefixOf :: Eq a => [a] -> [a] -> Bool
my_isPrefixOf = undefined

-- Exercise 2e (optional)
my_dropWhile :: (a -> Bool) -> [a] -> [a]
my_dropWhile = undefined

Hint for 2e: Be careful not to define a variation of filter. You are allowed to invoke foldr and then do something with the result, as long as you do not use recursion. Note that there is more than one solution to this problem.

Theory question (optional): Recall the definition of foldl:

foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f v []     = v
foldl f v (x:xs) = foldl f (f v x) xs

Prove that if the operation f :: a -> a -> a is associative with unit v :: a (i.e., f and v define a monoid structure on a), then foldr f v xs = foldl f v xs.

-- Exercise 2f (optional)
-- (your proof here)

Difference lists

--- Difference lists

As we discussed in Lecture 1, computing the concatenation xs ++ ys of two lists takes time linear in the length of xs. In particular, while cons’ing an element onto the front of a list is an O(1) operation, “snoc”ing an element to the back of a list is O(n). Difference lists are another way of representing lists used in functional programming, which make it possible to perform concatenation in constant time, while allowing conversion back to an ordinary list in linear time.

A difference list is a particular kind of function from lists to lists. Let us therefore introduce the following type synonym:

type DiffList a = [a] -> [a]

The terminology comes from the idea that we can think of the function as representing a list by the “difference” between the input list and the output list. In particular, every list induces a difference list as follows:

toDL :: [a] -> DiffList a
toDL xs = (xs++)

That is, xs is represented by the action of concatenating itself to the beginning of any input list. Conversely, a difference list can be converted back into an ordinary list by applying it to the empty list:

fromDL :: DiffList a -> [a]
fromDL dxs = dxs []

Observe that fromDL (toDL xs) = xs, since xs ++ [] = xs.

An elegant aspect of the difference list representation is that concatenation is implemented simply by function composition. In particular, both cons’ing and snoc’ing can be performed in O(1) time.

cons :: a -> DiffList a -> DiffList a
cons x dxs = (x:) . dxs

snoc :: DiffList a -> a -> DiffList a
snoc dxs x = dxs . (x:)

Write a function toDLrev :: [a] -> DiffList a that represents a list xs as the action of appending the reversal of itself to any input list. Then use this function to define a function my_reverse :: [a] -> [a] that reverses a list in linear time.

-- Exercise 3a
toDLrev :: [a] -> DiffList a
toDLrev = undefined

-- Exercise 3b
my_reverse :: [a] -> [a]
my_reverse = undefined

(You can write toDLrev using either recursion or foldr, but my_reverse should not be defined recursively.)

To test that your list-reversal routine indeed runs in linear time, you can turn on the :set +s flag in ghci to get it to print resource usage info, and then try evaluating length (my_reverse [1..n]) for large values of n. You can try comparing your implementation to the naive, quadratic implementation of list-reversal below:

naive_reverse :: [a] -> [a]
naive_reverse []     = []
naive_reverse (x:xs) = naive_reverse xs ++ [x]

And you can also try comparing it to the built-in reverse function. For example, here are some comparisons running on my laptop:

*Main> :set +s
*Main> length (my_reverse [1..10^3])
1000
(0.04 secs, 302,368 bytes)
*Main> length (naive_reverse [1..10^3])
1000
(0.04 secs, 28,259,072 bytes)
*Main> length (my_reverse [1..10^4])
10000
(0.03 secs, 2,391,272 bytes)
*Main> length (naive_reverse [1..10^4])
10000
(1.60 secs, 4,295,065,576 bytes)
*Main> length (reverse [1..10^4])
10000
(0.02 secs, 1,031,176 bytes)
*Main> length (my_reverse [1..10^5])
100000
(0.12 secs, 23,272,048 bytes)
*Main> length (reverse [1..10^5])
100000
(0.03 secs, 9,671,952 bytes)
*Main> length (my_reverse [1..10^6])
1000000
(0.69 secs, 232,072,784 bytes)
*Main> length (reverse [1..10^6])
1000000
(0.20 secs, 96,072,688 bytes)

Observe that the built-in reverse is somewhat faster than our implementation of my_reverse using difference lists, but that the latter still exhibits linear behavior, as opposed to naive_reverse which is quadratic (and would take too long to run on lists of length 10^5).

Theory question: Do fromDL and toDL define an isomorphism between the types [a] and DiffList a? Explain why or why not as a comment below.

-- Exercise 3c
-- (your explanation here)

One common application of difference lists in Haskell is pretty-printing. For example, when defining instances of the Show class, rather than implementing the operation

show :: a -> String

that converts a value of the given type to a readable string directly, it is considered better practice to define the operation

showsPrec :: Int -> a -> ShowS

from which show can be derived. Besides taking an additional integer as argument (corresponding to the operator precedence of the enclosing context), the return type of showsPrec differs from that of show since it constructs a difference list: note ShowS is just a synonym for String -> String. The show operation is then recovered using the default definition show x = showsPrec 10 x "".

For example, as explained in the documentation for Show, given the declarations

infixr 5 :^:
data Tree a = Leaf a | Tree a :^: Tree a
  deriving (Show)

the derived instance of Show is equivalent to

instance (Show a) => Show (Tree a) where
    showsPrec d (Leaf m) = showParen (d > app_prec) $
         showString "Leaf " . showsPrec (app_prec+1) m
      where app_prec = 10

    showsPrec d (u :^: v) = showParen (d > up_prec) $
         showsPrec (up_prec+1) u .
         showString " :^: "      .
         showsPrec (up_prec+1) v
      where up_prec = 5

[Some explanations for understanding all of the code: showParen :: Bool -> ShowS -> ShowS optionally wraps a string-represented-as-a-difference-list in parentheses depending on the value of its boolean argument, while showString :: String -> ShowS is equivalent to the function toDL we defined above. Finally, ($) :: (a -> b) -> a -> b is defined in the Prelude by f $ x = f x — in other words, it is just another name for function application, but defined as an infix operator, which sometimes allows you to omit parentheses in your code.]

Regular expression matching in continuation-passing style

--- Regular expression matching

Inspired by the article “Proof-directed debugging” by Robert Harper.

Recall that regular expressions are a notation for describing languages of strings, defining a class of languages equivalent to those recognized by finite-state automata. In the classical definition, the set of regular expressions over an alphabet \(\Sigma\) is defined inductively as follows:

We can translate this inductive definition directly into a Haskell data type:

data RegExp = Zero | One
            | C Char
            | Plus RegExp RegExp | Times RegExp RegExp
            | Star RegExp
  deriving (Show,Eq)

Any regular expression defines a language of strings as follows:

\[ \begin{eqnarray} L(0) &=& \emptyset \\ L(1) &=& \{ \epsilon \} \\ L(a) &=& \{ a \} \\ L(e_1 + e_2) &=& L(e_1) \cup L(e_2) \\ L(e_1 e_2) &=& L(e_1)L(e_2) = \{ w_1 w_2 \mid w_1 \in L(e_1), w_2 \in L(e_2) \} \\ L(e^*) &=& L(e)^* = \bigcup_{n \ge 0} \{ w_1 \cdots w_n \mid w_i \in L(e)\text{ for all }1 \le i \le n \} \end{eqnarray} \]

Since the language generated by a regular expression is a potentially infinite set of strings, we are not going to try to translate this definition directly into Haskell. Instead, the goal of this exercise will be to write a regular expression matcher, corresponding to a function

accept :: RegExp -> String -> Bool

that takes a regular expression \(e\) and a string \(w\) as input, and determines whether \(w \in L(e)\).

The technique we will be using is called continuation-passing style (or “CPS”). Rather than writing accept directly, we will define a higher-order function

acc :: RegExp -> String -> (String -> Bool) -> Bool

taking as an extra argument a function k :: String -> Bool, called the continuation. The idea is that acc e w k will try to match \(e\) against some initial prefix of \(w\), and then call the continuation \(k\) on the remaining suffix, in all possible ways. More precisely, we will implement acc e w k so as to maintain the following invariants:

Given such an implementation of acc, we can then define our regular expression matcher by:

accept e w = acc e w null

where we have taken the “initial continuation” to be the function null :: String -> Bool, since we want to ensure that \(e\) matches the entire word \(w\) and not just a prefix. Assuming acc matches the specification above, then accept e w will evaluate to True just in case \(w \in L(e)\), and otherwise it will evaluate to False.

Implement the function acc for Star-free regular expressions by completing the definition below (the cases for 0 and 1 have already been provided, for reference).

-- Exercise 4a
acc :: RegExp -> String -> (String -> Bool) -> Bool
acc Zero          w k = False
acc One           w k = k w
acc (C a)         w k = undefined
acc (Plus e1 e2)  w k = undefined
acc (Times e1 e2) w k = undefined

For example, after implementing acc, you should see the following results:

*Main> accept (Times (Plus (C 'a') One) (Plus One (C 'b'))) "ab"
True
*Main> accept (Times (Plus (C 'a') One) (Plus One (C 'b'))) "b"
True
*Main> accept (Times (Plus (C 'a') One) (Plus One (C 'b'))) "ba"
False

Now extend acc to work for expressions containing Star:

-- Exercise 4b (optional)
acc (Star e)      w k = undefined

For example, you should see the following results:

*Main> accept (Times (Star (Times (C 'a') (C 'b'))) (C 'a')) "ababa"
True
*Main> accept (Times (Star (Times (C 'a') (C 'b'))) (C 'a')) "abab"
False
*Main> accept (Star (Plus (Star (C 'c')) (Times (C 'a') (C 'b')))) "abccabcccab"
True

Hint: you need to be careful to ensure that your matcher does not go into an infinite loop! If it does, try to understand why and find a resolution. If you get stuck or if you are just curious you can read Harper’s article linked to above for a suggestion of one way to resolve the issue. (But be warned there is a typo in Section 4 of the paper, in the definition of the function \(r^-\). The English description of the set of non-empty strings matching \(r_1 r_2\) is correct, but the equation for \((r_1 r_2)^-\) is wrong.)

Finally, we should mention that although continuation-passing style is a very powerful technique that can be adapted to many different problems, matching regular expressions this way is in fact not a good idea! To see why, you can begin by evaluating the expression

accept (Star (Plus (Times (C 'a') (C 'a')) (Times (C 'a') (C 'a')))) (replicate 101 'a')

in GHC, and then (while you are waiting for it to finish) read this article by Russ Cox.