The exercises below can be done in any order, but you should try to complete the non-optional exercises if you can before the deadline. Sections containing mandatory exercises are indicated in blue, sections containing optional ones in red, and sections containing a mix of both in purple.
You should not use any external libraries, but it is okay to import functions from the standard library.
Download the template file Lab2.hs
, put all your solutions in this
file, and upload it to Moodle before the deadline. Make sure that you
submit a working Haskell file!
It will be convenient to import the module Data.List:
import Data.List
A few functions from this module are mentioned in the exercises
below. If you’re curious about what else this module exports you can
refer to hackage,
or you can get an offline description by typing
:browse Data.List
into the GHC interpreter.
--- Zipping exercises
The function zip
takes two lists and produces a list of
pairs, by pairing off successive elements of both lists until one or
both are exhausted. It can be defined as follows by recursion and
pattern-matching:
zip :: [a] -> [b] -> [(a,b)]
zip [] _ = []
zip _ [] = []
zip (x:xs) (y:ys) = (x,y):zip xs ys
For example, zip [0..4] "hello"
evaluates to
[(0,'h'),(1,'e'),(2,'l'),(3,'l'),(4,'o')]
, as does
zip [0..9] "hello"
.
There is also a higher-order version of zipping called
zipWith
, which takes a function
f :: a -> b -> c
as an extra argument and uses that
to produce a list of c
s instead of a list of
(a,b)
s. It can be defined as follows by recursion and
pattern-matching:
zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
zipWith f [] _ = []
zipWith f _ [] = []
zipWith f (x:xs) (y:ys) = f x y:zipWith f xs ys
For example, sum (zipWith (*) xs ys)
computes the dot product of two
lists of numbers xs
and ys
.
As a warmup exercise, express zip
in terms of
zipWith
:
-- Exercise 1a
my_zip :: [a] -> [b] -> [(a,b)]
= undefined my_zip
Now express zipWith
in terms of map
and
zip
, and possibly some other higher-order functions, but
not using any recursion:
-- Exercise 1b
my_zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
= undefined my_zipWith
The module Data.List includes a function
transpose :: [[a]] -> [[a]]
that transposes the rows and
the columns of a list of lists, viewed as a two-dimensional matrix. For
example, transpose [[1,2,3],[4,5,6],[7,8,9],[0,1,2]]
evaluates to [[1,4,7,0],[2,5,8,1],[3,6,9,2]]
. The
transpose
operation may be seen as a kind of n-ary
zip
, where we zip the rows together to form the
columns.
Implement transpose
for yourself below. For this
question you are allowed to use recursion, and you may assume that the
lengths of all the lists in the input list of lists are equal.
-- Exercise 1c (optional)
my_transpose :: [[a]] -> [[a]]
= undefined my_transpose
Hint: if you get stuck you can always look at the source for the
transpose
function on hackage, but you should try to implement it for yourself first! Note that the version in the standard library does not make the assumption that the lengths of all the lists in the input list of lists are equal, skipping over a row once it has run out of elements, but you do not need to implement this functionality.
--- Folding exercises
As discussed in class and in the lecture notes, the higher-order
function foldr
is a kind of jack-of-all-trades of list
functions. Let’s recall the definition here:
foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f v [] = v
foldr f v (x:xs) = f x (foldr f v xs)
For example, the function sum :: Num a => [a] -> a
computing the sum of a list of numbers can be defined by
sum = foldr (+) 0
.
For the following exercises, you need to express the given functions
using foldr
, without using any recursion. A good strategy
is to first figure out what should be the types of the
arguments f
and v
to pass to
foldr
.
A function bigxor :: [Bool] -> Bool
, which
computes the exclusive-or of all the booleans in a list. Examples:
bigxor [True,True,True] = True
and
bigxor [True,False,True] = False
.
A function altsum :: Num a => [a] -> a
computing the alternating sum of a list of numbers. Example:
altsum [1..5] = 1 - 2 + 3 - 4 + 5 = 1 - (2 - (3 - (4 - (5 - 0)))) = 3
.
The standard library function
intersperse :: a -> [a] -> [a]
, which intersperses a
given element between the elements of a list. Example:
intersperse ',' "abcde" = "a,b,c,d,e"
.
The standard library function
tails :: [a] -> [[a]]
, which returns the list of all
suffixes of a list. Example:
tails "abcde" = ["abcde","bcde","cde","de","e",""]
.
(Optional) The standard library function
isPrefixOf :: Eq a => [a] -> [a] -> Bool
determining whether the first list is a prefix of the second. Examples:
isPrefixOf "abc" "abcde" = True
but
isPrefixOf "abc" "def" = False
. Hint: since
isPrefix
is a curried function, you should take
b
to be a function type when defining
f :: a -> b -> b
and
v :: b
.
(Optional) The Prelude function
dropWhile :: (a -> Bool) -> [a] -> [a]
which
returns the remaining suffix of a list after dropping the longest prefix
of elements that satisfy a given predicate. Examples:
dropWhile (<3) [1..5] = [3,4,5]
and
dropWhile (/=3) [1..5] = [3,4,5]
. Hint: Be careful not
to define a variation of filter
! You are allowed to invoke
foldr
and then do something with the result, as long as you
do not use recursion. There is more than one possible solution to this
problem.
-- Exercise 2a
bigxor :: [Bool] -> Bool
= undefined
bigxor
-- Exercise 2b
altsum :: Num a => [a] -> a
= undefined
altsum
-- Exercise 2c
my_intersperse :: a -> [a] -> [a]
= undefined
my_intersperse
-- Exercise 2d
my_tails :: [a] -> [[a]]
= undefined
my_tails
-- Exercise 2e (optional)
my_isPrefixOf :: Eq a => [a] -> [a] -> Bool
= undefined
my_isPrefixOf
-- Exercise 2f (optional)
my_dropWhile :: (a -> Bool) -> [a] -> [a]
= undefined my_dropWhile
Theory question (optional): Recall the definition of
foldl
:
foldl :: (b -> a -> b) -> b -> [a] -> b
foldl f v [] = v
foldl f v (x:xs) = foldl f (f v x) xs
Prove that if the operation f :: a -> a -> a
is
associative with unit v :: a
(i.e., f
and
v
define a monoid structure on
a
), then foldr f v xs = foldl f v xs
.
-- Exercise 2g (optional)
-- (your proof here)
--- Difference lists
As we discussed in Lecture 1,
computing the concatenation xs ++ ys
of two lists takes
time linear in the length of xs
. In particular, while
cons’ing an element onto the front of a list is an O(1) operation,
“snoc”ing an element to the back of a list is O(n). The technique of
difference lists is an alternative way of representing
lists in functional programming, which make it possible to perform
concatenation in constant time, while allowing conversion back to an
ordinary list in linear time.
A difference list is a particular kind of function from lists to lists. Let us therefore introduce the following type synonym:
type DiffList a = [a] -> [a]
The terminology comes from the idea that we can think of the function as representing a list by the “difference” between the input list and the output list. In particular, every list induces a difference list as follows:
toDL :: [a] -> DiffList a
= (xs++) toDL xs
That is, xs
is represented by the action of
concatenating itself to the beginning of any input list. Conversely, a
difference list can be converted back into an ordinary list by applying
it to the empty list:
fromDL :: DiffList a -> [a]
= dxs [] fromDL dxs
Observe that fromDL (toDL xs) = xs
, since
xs ++ [] = xs
(as we proved in Exercise 3.1 of Lab 1!).
An elegant aspect of the difference list representation is that concatenation is implemented simply by function composition. In particular, both cons’ing and snoc’ing can be performed in O(1) time.
cons :: a -> DiffList a -> DiffList a
= (x:) . dxs
cons x dxs
snoc :: DiffList a -> a -> DiffList a
= dxs . (x:) snoc dxs x
Write a function toDLrev :: [a] -> DiffList a
that
represents a list xs
as the action of appending the
reversal of itself to any input list. Then use this function to
define a function my_reverse :: [a] -> [a]
that reverses
a list in linear time.
-- Exercise 3a
toDLrev :: [a] -> DiffList a
= undefined
toDLrev
-- Exercise 3b
my_reverse :: [a] -> [a]
= undefined my_reverse
(You can write toDLrev
using either recursion or foldr,
but my_reverse
should not be defined recursively.)
To test that your list-reversal routine indeed runs in linear time,
you can turn on the :set +s
flag in ghci to get it to print
resource usage info, and then try evaluating
length (my_reverse [1..n])
for large values of
n
. You can try comparing your implementation to the naive,
quadratic implementation of list-reversal below:
naive_reverse :: [a] -> [a]
= []
naive_reverse [] :xs) = naive_reverse xs ++ [x] naive_reverse (x
And you can also try comparing it to the built-in
reverse
function from the Haskell Prelude. For example,
here are some comparisons running on my laptop:
> :set +s
ghci> length (my_reverse [1..10^3])
ghci1000
0.04 secs, 302,368 bytes)
(> length (naive_reverse [1..10^3])
ghci1000
0.04 secs, 28,259,072 bytes)
(> length (my_reverse [1..10^4])
ghci10000
0.03 secs, 2,391,272 bytes)
(> length (naive_reverse [1..10^4])
ghci10000
1.60 secs, 4,295,065,576 bytes)
(> length (reverse [1..10^4])
ghci10000
0.02 secs, 1,031,176 bytes)
(> length (my_reverse [1..10^5])
ghci100000
0.12 secs, 23,272,048 bytes)
(> length (reverse [1..10^5])
ghci100000
0.03 secs, 9,671,952 bytes)
(> length (my_reverse [1..10^6])
ghci1000000
0.69 secs, 232,072,784 bytes)
(> length (reverse [1..10^6])
ghci1000000
0.20 secs, 96,072,688 bytes) (
Observe that the built-in reverse function is somewhat faster than our implementation based on difference lists, but that the latter still exhibits linear behavior (as opposed to the naive version which is quadratic, and takes too long to run on lists of length 10^5).
Theory question: Do fromDL
and
toDL
define an isomorphism between the types
[a]
and DiffList a
? Explain why or why not as
a comment below. [Added clarification: remember that
DiffList a
is just a type synonym for
[a] -> [a]
, so equivalently, this question is asking
whether fromDL
and toDL
define an isomorphism
between the type of lists [a]
and the type of endofunctions
on lists [a] -> [a]
.]
-- Exercise 3c
-- (your explanation here)
One common application of difference lists in Haskell is pretty-printing. For example, when defining instances of the Show class, rather than implementing the operation
show :: a -> String
that converts a value of the given type to a readable string directly, it is considered better practice to define the operation
showsPrec :: Int -> a -> ShowS
from which show
can be derived. Besides taking an
additional integer as argument (corresponding to the operator precedence
of the enclosing context), the return type of showsPrec
differs from that of show
since it constructs a difference
list: note ShowS
is just a synonym for
String -> String
. The show
operation is
then recovered using the default definition
show x = showsPrec 10 x ""
.
For example, as explained in the documentation for Show, given the declarations
infixr 5 :^:
data Tree a = Leaf a | Tree a :^: Tree a
deriving (Show)
the derived instance of Show is equivalent to
instance (Show a) => Show (Tree a) where
showsPrec d (Leaf m) = showParen (d > app_prec) $
showString "Leaf " . showsPrec (app_prec+1) m
where app_prec = 10
showsPrec d (u :^: v) = showParen (d > up_prec) $
showsPrec (up_prec+1) u .
showString " :^: " .
showsPrec (up_prec+1) v
where up_prec = 5
[Some explanations for understanding all of the code:
showParen :: Bool -> ShowS -> ShowS
optionally wraps
a string-represented-as-a-difference-list in parentheses depending on
the value of its boolean argument, while
showString :: String -> ShowS
is equivalent to the
function toDL
we defined above. Finally,
($) :: (a -> b) -> a -> b
is defined in the
Prelude by f $ x = f x
— in other words, it is just another
name for function application, but defined as an infix operator, which
sometimes allows you to omit parentheses in your code.]
--- Regular expression matching
Inspired by the article “Proof-directed debugging” by Robert Harper.
Recall that regular expressions are a notation for describing sets of possible strings, equivalent in expressive power to finite-state automata. In the classical definition, the regular expressions over an alphabet \(\Sigma\) are defined inductively as follows:
We can translate this inductive definition directly into a Haskell data type:
data RegExp = Zero | One
| C Char
| Plus RegExp RegExp | Times RegExp RegExp
| Star RegExp
deriving (Show,Eq)
Every regular expression \(e\) generates a corresponding language of strings \(L(e)\) defined as follows:
\[ \begin{eqnarray} L(0) &=& \emptyset \\ L(1) &=& \{ \epsilon \} \\ L(a) &=& \{ a \} \\ L(e_1 + e_2) &=& L(e_1) \cup L(e_2) \\ L(e_1 e_2) &=& L(e_1)L(e_2) = \{ w_1 w_2 \mid w_1 \in L(e_1), w_2 \in L(e_2) \} \\ L(e^*) &=& L(e)^* = \bigcup_{n \ge 0} \{ w_1 \cdots w_n \mid w_i \in L(e)\text{ for all }1 \le i \le n \} \end{eqnarray} \]
Since the language generated by a regular expression is a potentially infinite set of strings, we are not going to try to translate this definition directly into Haskell. Instead, the goal of this exercise will be to write a regular expression matcher, corresponding to a function
accept :: RegExp -> String -> Bool
that takes a regular expression \(e\) and a string \(w\) as input, and determines whether \(w \in L(e)\).
The technique we will be using is called continuation-passing
style. Rather than writing accept
directly, we
will define a higher-order function
acc :: RegExp -> String -> (String -> Bool) -> Bool
taking as an extra argument a function
k :: String -> Bool
, called the continuation.
The idea is that acc e w k
will try to match \(e\) against some initial prefix of \(w\) and then try calling the continuation
\(k\) on the remaining suffix, if
necessary backtracking to try another prefix. Formally, we will
implement acc e w k
so as to maintain the following
invariants:
If there exists some splitting of the input string \(w = w_1 w_2\), with \(w_1 \in L(e)\) and with \(k(w_2)\) evaluating to True, then
acc e w k
must evaluate to True.
If for every splitting of the input string \(w = w_1 w_2\), either \(w_1 \notin L(e)\) or \(k(w_2)\) evaluates to False, then
acc e w k
must evaluate to False.
Given such an implementation of acc
, we can then define
our regular expression matcher by:
= acc e w null accept e w
where we have taken the “initial continuation” to be the function
null :: String -> Bool
, since we want to ensure that
\(e\) matches the entire word \(w\) and not just a prefix. Assuming
acc
matches the specification above, then
accept e w
will evaluate to True just in case \(w \in L(e)\), and otherwise it will
evaluate to False. (You should prove this for yourself!)
Implement the function acc
for Star
-free
regular expressions by completing the definition below (the cases for 0
and 1 have already been provided, for reference).
-- Exercise 4a
acc :: RegExp -> String -> (String -> Bool) -> Bool
Zero w k = False
acc One w k = k w
acc C a) w k = undefined
acc (Plus e1 e2) w k = undefined
acc (Times e1 e2) w k = undefined acc (
For example, after implementing acc
, you should see the
following results:
> accept (Times (Plus (C 'a') One) (Plus One (C 'b'))) "ab"
ghciTrue
> accept (Times (Plus (C 'a') One) (Plus One (C 'b'))) "b"
ghciTrue
> accept (Times (Plus (C 'a') One) (Plus One (C 'b'))) "ba"
ghciFalse
Now extend acc
to work for expressions containing
Star
:
-- Exercise 4b (optional)
Star e) w k = undefined acc (
For example, you should see the following results:
> accept (Times (Star (Times (C 'a') (C 'b'))) (C 'a')) "ababa"
ghciTrue
> accept (Times (Star (Times (C 'a') (C 'b'))) (C 'a')) "abab"
ghciFalse
> accept (Star (Plus (Star (C 'c')) (Times (C 'a') (C 'b')))) "abccabcccab"
ghciTrue
Hint: you need to be careful to ensure that your matcher does not go into an infinite loop! If it does, try to understand why and find a resolution. If you get stuck or if you are just curious you can read Harper’s article linked to above for a suggestion of one way to resolve the issue. (But be warned there is a typo in Section 4 of the paper, in the definition of the function \(r^-\). The English description of the set of non-empty strings matching \(r_1 r_2\) is correct, but the equation for \((r_1 r_2)^-\) is wrong.)
Finally, we should warn that although continuation-passing style is a very powerful and general technique, matching regular expressions this way is in fact not a good idea! To understand why, you can begin by evaluating the expression
Star (Plus (Times (C 'a') (C 'a')) (Times (C 'a') (C 'a')))) (replicate 101 'a') accept (
in GHC, and then (while you are waiting for it to finish) read this article by Russ Cox.