CSE301

A few functions from this module are mentioned in the exercises below. If you’re curious about what else this module exports you can refer to hackage, or you can get an offline description by typing :browse Data.List into the GHC interpreter.

Zipping exercises

The function zip takes two lists and produces a list of pairs, by pairing off successive elements of both lists until one or both are exhausted. It can be defined as follows by recursion and pattern-matching:

For example, zip [0..4] "hello" evaluates to [(0,'h'),(1,'e'),(2,'l'),(3,'l'),(4,'o')], as does zip [0..9] "hello".

There is also a higher-order version of zipping called zipWith, which takes a function f :: a -> b -> c as an extra argument and uses that to produce a list of cs instead of a list of (a,b)s. It can be defined as follows by recursion and pattern-matching:

For example, sum (zipWith (*) xs ys) computes the dot product of two lists of numbers xs and ys.

As a warmup exercise, express zip in terms of zipWith:

-- Exercise 1a
my_zip :: [a] -> [b] -> [(a,b)]
my_zip = undefined

Now express zipWith in terms of map and zip, and possibly some other higher-order functions, but not using any recursion:

-- Exercise 1b
my_zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
my_zipWith = undefined

The module Data.List includes a function transpose :: [[a]] -> [[a]] that transposes the rows and the columns of a list of lists, viewed as a two-dimensional matrix. For example, transpose [[1,2,3],[4,5,6],[7,8,9],[0,1,2]] evaluates to [[1,4,7,0],[2,5,8,1],[3,6,9,2]]. The transpose operation may be seen as a kind of n-ary zip, where we zip the rows together to form the columns.

Implement transpose for yourself below. For this question you are allowed to use recursion, and you may assume that the lengths of all the lists in the input list of lists are equal.

-- Exercise 1c (optional)
my_transpose :: [[a]] -> [[a]]
my_transpose = undefined

Hint: if you get stuck you can always look at the source for the transpose function on hackage, but you should try to implement it for yourself first! Note that the version in the standard library does not make the assumption that the lengths of all the lists in the input list of lists are equal, skipping over a row once it has run out of elements, but you do not need to implement this functionality.

Folding exercises

As discussed in class and in the lecture notes, the higher-order function foldr is a kind of jack-of-all-trades of list functions. Let’s recall the definition here:

For example, the function sum :: Num a => [a] -> a computing the sum of a list of numbers can be defined by sum = foldr (+) 0.

For the following exercises, you need to express the given functions using foldr, without using any recursion. A good strategy is to first figure out what should be the types of the arguments f and v to pass to foldr.

A function bigxor :: [Bool] -> Bool, which computes the exclusive-or of all the booleans in a list. Examples: bigxor [True,True,True] = True and bigxor [True,False,True] = False.
A function altsum :: Num a => [a] -> a computing the alternating sum of a list of numbers. Example: altsum [1..5] = 1 - 2 + 3 - 4 + 5 = 1 - (2 - (3 - (4 - (5 - 0)))) = 3.
The standard library function intersperse :: a -> [a] -> [a], which intersperses a given element between the elements of a list. Example: intersperse ',' "abcde" = "a,b,c,d,e".
The standard library function tails :: [a] -> [[a]], which returns the list of all suffixes of a list. Example: tails "abcde" = ["abcde","bcde","cde","de","e",""].
(Optional) The standard library function isPrefixOf :: Eq a => [a] -> [a] -> Bool determining whether the first list is a prefix of the second. Examples: isPrefixOf "abc" "abcde" = True but isPrefixOf "abc" "def" = False. Hint: since isPrefix is a curried function, you should take b to be a function type when defining f :: a -> b -> b and v :: b.
(Optional) The Prelude function dropWhile :: (a -> Bool) -> [a] -> [a] which returns the remaining suffix of a list after dropping the longest prefix of elements that satisfy a given predicate. Examples: dropWhile (<3) [1..5] = [3,4,5] and dropWhile (/=3) [1..5] = [3,4,5]. Hint: Be careful not to define a variation of filter! You are allowed to invoke foldr and then do something with the result, as long as you do not use recursion. There is more than one possible solution to this problem.

-- Exercise 2a
bigxor :: [Bool] -> Bool
bigxor = undefined

-- Exercise 2b
altsum :: Num a => [a] -> a
altsum = undefined

-- Exercise 2c
my_intersperse :: a -> [a] -> [a]
my_intersperse = undefined

-- Exercise 2d
my_tails :: [a] -> [[a]]
my_tails = undefined

-- Exercise 2e (optional)
my_isPrefixOf :: Eq a => [a] -> [a] -> Bool
my_isPrefixOf = undefined

-- Exercise 2f (optional)
my_dropWhile :: (a -> Bool) -> [a] -> [a]
my_dropWhile = undefined

Theory question (optional): Recall the definition of foldl:

foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f v []     = v
foldl f v (x:xs) = foldl f (f v x) xs

Prove that if the operation f :: a -> a -> a is associative with unit v :: a (i.e., f and v define a monoid structure on a), then foldr f v xs = foldl f v xs.

-- Exercise 2g (optional)
-- (your proof here)

Difference lists

As we discussed in Lecture 1, computing the concatenation xs ++ ys of two lists takes time linear in the length of xs. In particular, while cons’ing an element onto the front of a list is an O(1) operation, “snoc”ing an element to the back of a list is O(n). The technique of difference lists is an alternative way of representing lists in functional programming, which make it possible to perform concatenation in constant time, while allowing conversion back to an ordinary list in linear time.

A difference list is a particular kind of function from lists to lists. Let us therefore introduce the following type synonym:

The terminology comes from the idea that we can think of the function as representing a list by the “difference” between the input list and the output list. In particular, every list induces a difference list as follows:

That is, xs is represented by the action of concatenating itself to the beginning of any input list. Conversely, a difference list can be converted back into an ordinary list by applying it to the empty list:

Observe that fromDL (toDL xs) = xs, since xs ++ [] = xs (as we proved in Exercise 3.1 of Lab 1!).

An elegant aspect of the difference list representation is that concatenation is implemented simply by function composition. In particular, both cons’ing and snoc’ing can be performed in O(1) time.

Write a function toDLrev :: [a] -> DiffList a that represents a list xs as the action of appending the reversal of itself to any input list. Then use this function to define a function my_reverse :: [a] -> [a] that reverses a list in linear time.

-- Exercise 3a
toDLrev :: [a] -> DiffList a
toDLrev = undefined

-- Exercise 3b
my_reverse :: [a] -> [a]
my_reverse = undefined

(You can write toDLrev using either recursion or foldr, but my_reverse should not be defined recursively.)

To test that your list-reversal routine indeed runs in linear time, you can turn on the :set +s flag in ghci to get it to print resource usage info, and then try evaluating length (my_reverse [1..n]) for large values of n. You can try comparing your implementation to the naive, quadratic implementation of list-reversal below:

And you can also try comparing it to the built-in reverse function from the Haskell Prelude. For example, here are some comparisons running on my laptop:

Observe that the built-in reverse function is somewhat faster than our implementation based on difference lists, but that the latter still exhibits linear behavior (as opposed to the naive version which is quadratic, and takes too long to run on lists of length 10^5).

Theory question: Do fromDL and toDL define an isomorphism between the types [a] and DiffList a? Explain why or why not as a comment below. [Added clarification: remember that DiffList a is just a type synonym for [a] -> [a], so equivalently, this question is asking whether fromDL and toDL define an isomorphism between the type of lists [a] and the type of endofunctions on lists [a] -> [a].]

-- Exercise 3c
-- (your explanation here)

One common application of difference lists in Haskell is pretty-printing. For example, when defining instances of the Show class, rather than implementing the operation

that converts a value of the given type to a readable string directly, it is considered better practice to define the operation

from which show can be derived. Besides taking an additional integer as argument (corresponding to the operator precedence of the enclosing context), the return type of showsPrec differs from that of show since it constructs a difference list: note ShowS is just a synonym for String -> String. The show operation is then recovered using the default definition show x = showsPrec 10 x "".

[Some explanations for understanding all of the code: showParen :: Bool -> ShowS -> ShowS optionally wraps a string-represented-as-a-difference-list in parentheses depending on the value of its boolean argument, while showString :: String -> ShowS is equivalent to the function toDL we defined above. Finally, ($) :: (a -> b) -> a -> b is defined in the Prelude by f $ x = f x — in other words, it is just another name for function application, but defined as an infix operator, which sometimes allows you to omit parentheses in your code.]

Regular expression matching in continuation-passing style

Recall that regular expressions are a notation for describing sets of possible strings, equivalent in expressive power to finite-state automata. In the classical definition, the regular expressions over an alphabet $\Sigma$ are defined inductively as follows:

Every regular expression $e$ generates a corresponding language of strings $L(e)$ defined as follows:

\[ \begin{eqnarray} L(0) &=& \emptyset \\ L(1) &=& \{ \epsilon \} \\ L(a) &=& \{ a \} \\ L(e_1 + e_2) &=& L(e_1) \cup L(e_2) \\ L(e_1 e_2) &=& L(e_1)L(e_2) = \{ w_1 w_2 \mid w_1 \in L(e_1), w_2 \in L(e_2) \} \\ L(e^*) &=& L(e)^* = \bigcup_{n \ge 0} \{ w_1 \cdots w_n \mid w_i \in L(e)\text{ for all }1 \le i \le n \} \end{eqnarray} \]

Since the language generated by a regular expression is a potentially infinite set of strings, we are not going to try to translate this definition directly into Haskell. Instead, the goal of this exercise will be to write a regular expression matcher, corresponding to a function

that takes a regular expression $e$ and a string $w$ as input, and determines whether $w \in L(e)$.

The technique we will be using is called continuation-passing style. Rather than writing accept directly, we will define a higher-order function

taking as an extra argument a function k :: String -> Bool, called the continuation. The idea is that acc e w k will try to match $e$ against some initial prefix of $w$ and then try calling the continuation $k$ on the remaining suffix, if necessary backtracking to try another prefix. Formally, we will implement acc e w k so as to maintain the following invariants:

Given such an implementation of acc, we can then define our regular expression matcher by:

where we have taken the “initial continuation” to be the function null :: String -> Bool, since we want to ensure that $e$ matches the entire word $w$ and not just a prefix. Assuming acc matches the specification above, then accept e w will evaluate to True just in case $w \in L(e)$, and otherwise it will evaluate to False. (You should prove this for yourself!)

Implement the function acc for Star-free regular expressions by completing the definition below (the cases for 0 and 1 have already been provided, for reference).

-- Exercise 4a
acc :: RegExp -> String -> (String -> Bool) -> Bool
acc Zero          w k = False
acc One           w k = k w
acc (C a)         w k = undefined
acc (Plus e1 e2)  w k = undefined
acc (Times e1 e2) w k = undefined

For example, after implementing acc, you should see the following results:

ghci> accept (Times (Plus (C 'a') One) (Plus One (C 'b'))) "ab"
True
ghci> accept (Times (Plus (C 'a') One) (Plus One (C 'b'))) "b"
True
ghci> accept (Times (Plus (C 'a') One) (Plus One (C 'b'))) "ba"
False

Now extend acc to work for expressions containing Star:

-- Exercise 4b (optional)
acc (Star e)      w k = undefined

For example, you should see the following results:

ghci> accept (Times (Star (Times (C 'a') (C 'b'))) (C 'a')) "ababa"
True
ghci> accept (Times (Star (Times (C 'a') (C 'b'))) (C 'a')) "abab"
False
ghci> accept (Star (Plus (Star (C 'c')) (Times (C 'a') (C 'b')))) "abccabcccab"
True

Hint: you need to be careful to ensure that your matcher does not go into an infinite loop! If it does, try to understand why and find a resolution. If you get stuck or if you are just curious you can read Harper’s article linked to above for a suggestion of one way to resolve the issue. (But be warned there is a typo in Section 4 of the paper, in the definition of the function $r^-$. The English description of the set of non-empty strings matching $r_1 r_2$ is correct, but the equation for $(r_1 r_2)^-$ is wrong.)

Finally, we should warn that although continuation-passing style is a very powerful and general technique, matching regular expressions this way is in fact not a good idea! To understand why, you can begin by evaluating the expression

accept (Star (Plus (Times (C 'a') (C 'a')) (Times (C 'a') (C 'a')))) (replicate 101 'a')

in GHC, and then (while you are waiting for it to finish) read this article by Russ Cox.

Lab 2 (higher-order functions and type classes)

General instructions

Importing from the Standard Library

Zipping exercises

Folding exercises

Difference lists

Regular expression matching in continuation-passing style