[CSE301](https://www.lix.polytechnique.fr/Labo/Noam.ZEILBERGER/teaching/CSE301/) # Lab 1 (getting started with Haskell) 1. Read the instructions for "Getting up and running with Haskell" on the main [course page](../../index.html). 2. Open `ghci`, and try out some examples from Chapter 2 of [Learn You a Haskell for Great Good!](http://learnyouahaskell.com/starting-out#ready-set-go). Type `:quit` once you've tried enough examples. 3. Read "A quick introduction to Haskell" below. 4. Do the marked exercises and upload your solutions as one file `Lab1.hs` on Moodle. # A quick introduction to Haskell _Adapted from notes by [Martín Escardó](https://www.cs.bham.ac.uk/~mhe/)._ # Contents * [Built-in types](#builtintypes) * [Tuples](#tuples) * [Lists](#lists) * [Pattern-matching on lists](#pmlists) * [null](#null) * [head](#head) * [reverse](#reverse) * [qsort](#qsort) * [msort **(Exercises!)**](#msort) * [Side-effects](#effects) * [Laziness](#laziness) * [Full example **(Exercise!)**](#fullex) ## Built-in types The Haskell language comes equipped with a few basic, built-in types: * `Bool`: with values `True` and `False`, as well as standard boolean operations like conjunction (`&&`), disjunction (`||`), and negation (`not`). * `Integer`: the type of integers supporting arbitrary-precision arithmetic. This type is used by default when performing integer arithmetic in the interpreter, e.g., if you ask `ghci` to compute the hundredth power of five: ``` ghci> 5^100 7888609052210118054117285652827862296732064351090230047702789306640625 ``` * `Int`: the type of fixed-precision, machine-word sized signed integers (the exact size is architecture-dependent, but must include at least the range [-2^29 .. 2^29-1]). You can force the interpreter to do arithmetic using fixed-precision integers by including a type annotation, e.g., like so: ``` ghci> 5^100 :: Int -3842938066129721103 ``` * `Char`: the type of Unicode characters. Character literals are entered in quotes, e.g., `'a'` represents the first lowercase letter of the alphabet. The module [`Data.Char`](https://hackage.haskell.org/package/base-4.15.0.0/docs/Data-Char.html) from the standard library contains various useful routines for manipulating characters, such as `toUpper :: Char -> Char` and `toLower :: Char -> Char`. Example: ``` ghci> import Data.Char ghci> toUpper 'a' 'A' ``` Observe that we began by importing the `Data.Char` module. Alternatively we could have entered the full path `Data.Char.toUpper` to the routine. By the way, we can get a list of all of the current imported modules using the `:show imports` command: ``` ghci> :show imports import Data.Char import Prelude -- implicit ``` * `String`: the type of strings. Strings are entered in double quotes, for example `"hello"` is an expression of type `String`. Actually, in Haskell, `String` is just a *synonym* for `[Char]`, the type of lists of characters. This means in particular that any generic operation on lists (such as concatenation `++`) can also be applied to strings. ``` > "hello" ++ "world" "helloworld" > length("hello" ++ "world") 10 ``` * `Float` and `Double`: these are respectively the types of single-precision and double-precision floating point numbers. The latter is used by default in the interpreter: ``` > 1 / 5^100 1.2676506002282291e-70 > 1 / 5^100 :: Float 0.0 ``` * `Rational`: the type of arbitrary-precision fractions. ``` > 1 / 5^100 :: Rational 1 % 7888609052210118054117285652827862296732064351090230047702789306640625 ``` ## Tuples Two or more types can be combined to form *tuple types*, which are notated as a comma-separated list surrounded by round parentheses. For example, `(Char,Int)` is the type of pairs of a character and a fixed-precision integer, while `(Double,Double,Double)` is the type of triples of double-precision floating point numbers. The same notation can be used to construct values, e.g., `('a',2)` is a value of type `(Char,Int)`. The components of a pair can be extracted using the functions `fst` and `snd`, which are examples of *polymorphic* functions in the sense that they work irrespective of the types of the components. For example `snd('a',2)` evaluates to `2`. By the way, you can ask `ghci` to tell you the most general type of an arbitrary expression with the `:type` directive: ``` > :type fst fst :: (a, b) -> a > :type snd snd :: (a, b) -> b ``` (In a later lecture, we will talk about how GHC computes these types.) Haskell also provides a `let` construct which can be used to extract the components of a tuple of arbitrary dimension: ``` > let (x,y,z) = (1,pi,sqrt(2)) in x + y + z 5.555806215962888 ``` ## Lists Haskell has support for lists of any *homogenous* type, where `[a]` (in square brackets) denotes the type of lists of `a`s. For example, `[Integer]` is the type of lists of arbitrary-precision integers, and `[Int]` is the type of lists of fixed-precision integers. Lists can be constructed using a variety of notations, here are some examples: * `[]` is the empty list. The empty list is sometimes pronounced "nil". * `[1,2,3]` is a list with three elements. * `[1..10]` is a list, equal to `[1,2,3,4,5,6,7,8,9,10]`. * `0 : [1,2,3]` is equal to `[0,1,2,3]`. The symbol `:` (pronounced "cons") is for adding a new head to a list. * `[3]` is a one-element list, equivalent to `3 : []`. * The symbol `++` is used for concatenation of two lists. For example `[1,2,3] ++ [100,101,102]` evaluates to `[1,2,3,100,101,102]`. * `[(x,x-1) | x <- [1..10], mod x 2 == 0]` is an example of a *list comprehension*, which evaluates to the list of pairs `[(2,1),(4,3),(6,5),(8,7),(10,9)]`. Lists can also be deconstructed in a variety of ways: * The `head` of a non-empty list is its first element, its `tail` is the list of remaining elements. ``` > head [1..5] 1 > tail [1..5] [2,3,4,5] ``` * Dually, applying `last` to a non-empty list returns its last element, while `init` drops the last element: ``` > last [1..5] 5 > init [1..5] [1,2,3,4] ``` Beware, though, that due to the way lists are represented internally, these two pairs of functions have very different time complexities! Both `head` and `tail` are Θ(1) in the length of the list, while `last` and `init` are both Θ(n). * The operation `xs !! i` can be used to access the `i`th element of a list `xs`, under a 0-based indexing scheme. Here are a couple examples (recall that strings in Haskell are just lists of characters): ``` > [1..5] !! 1 2 > "hello" !! 1 'e' ``` * The operations `take n xs` and `drop n xs` respectively take and drop the first `n` elements of a list `xs`: ``` > take 3 [1..5] [1,2,3] > drop 3 [1..5] [4,5] ``` * Via pattern-matching, as we will discuss shortly. ## Pattern-matching on lists Pattern-matching is a concise notation for defining functions over data types, which works by partitioning the range of possible inputs into a finite set of mutually-exclusive cases. It is especially powerful when combined with user-defined data types, but it is also very convenient as a way of defining functions over lists. We will talk more about pattern-matching in Lecture 1, but for now let's consider a few examples... ### null The following is a possible definition of the standard library function `null :: [a] -> Bool`, which determines whether or not a list is empty: ```hs null [] = True null (x:xs) = False ``` Observe that there are two equations corresponding to two possible *shapes* of input: one case for the empty list `[]`, and another case for an arbitrary non-empty list of the form `x:xs`. Since the variables `x` and `xs` are not used in the right-hand side of the second case, we could have also written ```hs null (_:_) = False ``` making use of the "wildcard" pattern `_`. Equivalently, we could also use Haskell's `case` construct to perform the pattern-matching: ```hs null xs = case xs of [] -> True (_:_) -> False ``` ### head The destructor `head` mentioned above can be defined using pattern-matching as follows: ```hs head (x:xs) = x ``` This definition says, concisely, that the `head` of any non-empty list `x:xs` is `x`. But what happens if we take the `head` of an empty list? ``` > head [] *** Exception: Prelude.head: empty list ``` In general, when the input does not match any one of patterns in the definition of a function, Haskell generates a pattern-matching exception by default — although here, the [definition of `head` in the standard library](https://hackage.haskell.org/package/base-4.17.0.0/docs/src/GHC.List.html#head) actually deals explicitly with the case of the empty list in order to produce a tailored error message, which is good practice. ### reverse The standard library function `reverse :: [a] -> [a]`, which reverses a list, can be defined recursively as follows*: ```hs reverse [] = [] reverse (x:xs) = reverse xs ++ [x] ``` Notice that in the case of a non-empty list `x:xs` we make a recursive call to `reverse xs` and concatenate the result with the singleton list `[x]`. (*Actually, this implementation of `reverse` has the flaw that it takes time Θ(n²) in the length of the list. We will discuss how to write a better version in Lecture 1.) ### qsort Here is a very concise version of [Quicksort](https://en.wikipedia.org/wiki/Quicksort), defined using pattern-matching and list comprehensions: ```haskell qsort :: Ord a => [a] -> [a] qsort [] = [] qsort (x:xs) = qsort ys ++ [x] ++ qsort zs where ys = [a | a <- xs, a <= x] zs = [b | b <- xs, b > x] ``` Some other things worth mentioning here: * The stuff before `=>` in the type declaration are called type constraints. In this example, the constraint is that the type `a` must be in the class `Ord` of types that admit an ordering. (We will talk more about type class constraints in Lecture 2.) * The `where` construct allows introducing local definitions *after* their use. Finally, you might be wondering about how to enter definitions like the above into the GHC interpreter. If you try to copy the lines above directly into `ghci` then you will get a bunch of errors, since it is does not accept multiline input by default. The best way of dealing with this is probably to copy the definition into a file, and then load the file. For example, if you copy the above lines into a file named `Lab1.hs`, you can load the definition of `qsort` by running `:load Lab1.hs` or `:load Lab1`: ``` ghci> :load Lab1 [1 of 1] Compiling Main ( Lab1.hs, interpreted ) Ok, one module loaded. ghci> qsort [3,1,2] [1,2,3] ``` Alternatively, you can copy multiline definitions directly into `ghci` by wrapping them in braces `:{` ... `:}`, for example: ``` ghci> :{ ghci| qsort :: Ord a => [a] -> [a] ghci| qsort [] = [] ghci| qsort (x:xs) = qsort ys ++ [x] ++ qsort zs ghci| where ghci| ys = [a | a <- xs, a <= x] ghci| zs = [b | b <- xs, b > x] ghci| :} ghci> qsort [3,1,2] [1,2,3] ``` ### msort The [Mergesort](https://en.wikipedia.org/wiki/Merge_sort) algorithm makes use of an operation `merge`, which takes two sorted lists and returns a single list with all their elements in sorted order: ```haskell merge :: Ord a => [a] -> [a] -> [a] merge [] [] = [] merge [] ys = ys merge xs [] = xs merge (x:xs) (y:ys) = if x <= y then x : merge xs (y:ys) else y : merge (x:xs) ys ``` Alternatively, we can use *pattern guards* rather than an if-then-else expression to define `merge`, as in the following equivalent definition: ```hs merge :: Ord a => [a] -> [a] -> [a] merge [] [] = [] merge [] ys = ys merge xs [] = xs merge (x:xs) (y:ys) | x <= y = x : merge xs (y:ys) | otherwise = y : merge (x:xs) ys ``` **Exercise**: complete the definition of Mergesort by writing a function `msort :: Ord a => [a] -> [a]` which sorts a list by splitting it in two pieces, then recursively sorting those pieces and merging them back together. **Exercise**: write a function `isSorted :: Ord a => [a] -> Bool` which decides whether the input list is sorted *in linear time*. Verify that `isSorted (reverse [1..100]) = False`, and that `isSorted (msort (reverse [1..100])) = True`. ## Side-effects In addition to computing values, a typical running program can have side-effects such as printing to the screen, querying the user for input, writing to a file, etc. Haskell is a bit idiosyncratic in confining such "impurities" to so-called monadic types, such as [IO()](http://learnyouahaskell.com/input-and-output). ```haskell hello :: IO() hello = do putStrLn "Hello, what's your name?" name <- getLine putStrLn ("Hey " ++ name ++ ", you rock!") ``` On the other hand, a function of type `Integer -> Integer`, for example, can't do anything other than return an integer.* [*Or possibly go into an infinite loop or raise an exception, which are considered as "harmless" side-effects in Haskell.] If you do want to define a function that performs some side-effects in addition to (or instead of) returning a value, then you have to assign it a monadic type like `Integer -> IO Integer`. (We will talk more about side-effects and monads in Lecture 4.) ## Laziness Another thing that distinguishes Haskell from many other functional programming languages (such as OCaml and Agda) is that Haskell enforces *call-by-need* (or "lazy") evaluation. ```haskell bigInteger :: Integer bigInteger = last [1..10000000000000000] f :: Integer -> Integer f x = 17 example1 :: IO() example1 = putStrLn (show(f bigInteger)) ``` The above program works as follows: the expression `[1..10000000000000000]` produces the list of numbers from 1 to 10000000000000000, one by one. This is a big list, and won't fit in your computer. the function `last` takes the last element of this list. This will take a long time, but won't exhaust memory, because Haskell is lazy, and will create the elements of the list on demand. Because Haskell is lazy, `bigInteger` (which is 10000000000000000), won't be evaluated until it is needed. The function `f` doesn't use its argument, and hence in the call `f bigInteger`, the argument `bigInteger` is not evaluated. So the function `example1` prints 17 very quickly, without ever evaluating `bigInteger`. An equivalent program in a call-by-value language such as Java wouldn't work in practice, as the call `f bigInteger` would attempt to evaluate `bigInteger` first. The above is a silly example. One more practical use of laziness is to read a whole, large file, then process it, then output. Because of laziness, the whole file won't be read at once, but on demand, automatically. Another use of laziness is to work with infinite objects, such as lists. For example, `[0..]` is the infinite list `[0,1,2,3,...]`. Its evaluation never terminates. However, `take 5 [0..]` evaluates to `[0,1,2,3,4]` in finite time. (We will talk more about lazy evaluation and infinite objects in Lecture 5.) ## Full example This code converts a Markdown file to a Haskell file, by extracting the code only: ```hs import Data.Char begin = "```haskell" end = "```" mdtohs :: [String] -> String mdtohs [] = [] mdtohs (xs:xss) | take (length begin) (dropWhile isSpace xs) == begin = copy (length (takeWhile isSpace xs)) xss | otherwise = mdtohs xss copy :: Int -> [String] -> String copy i [] = [] copy i (xs:xss) | take (length end) (dropWhile isSpace xs) == end = "\n" ++ mdtohs xss | otherwise = drop i xs ++ "\n" ++ copy i xss main :: IO() main = interact(mdtohs . lines) ``` The function `lines :: String -> [String]` converts a string into a list of strings, one for each line terminated by "\\n". Here we compose it with the function `mdtohs :: [String] -> String` using the function composition operator `(.)`. The function `interact :: (String -> String) -> IO ()` converts a function `String -> String` into a program of type `IO()` which reads from the standard input (stdin) into the standard output (stdout). For example, after copying the above into a file named "mdtohs.hs", the following command generates the file `Lab1.hs` from the file `Lab1.md` under a Unix-like system: ``` $ cat Lab1.md | runhaskell mdtohs.hs > Lab1.hs ``` Some explanations: * The cat command echoes the file [Lab1.md](Lab1.md) to stdout. * The unix pipe `|` makes the standard output of the previous thing into the standard input of the next thing. * Then we run Haskell with the file `mdtohs.hs` containing the above conversion code. * The linux redirection `>` saves the output of the previous thing to the file named after it, namely `Lab1.hs` in this case. Since we may want to run the Markdown-to-Haskell conversion code often, it is worth compiling the file `mdtohs.hs` to produce an executable program, which we can do as follows (the `-O2` flag tells the ghc compiler to turn on some optimizations): ``` $ ghc -O2 mdtohs ``` Now we can run the `mdtohs` program directly (rather than by calling `runhaskell`), e.g., as follows: ``` $ cat Lab1.md | ./mdtohs > Lab1.hs ``` **Exercise**: write a stand-alone Haskell program (equipped with a `main :: IO()` method) that reads lines from standard input and writes them back to standard output in sorted order. Your program should be roughly equivalent to the Linux [`sort`](https://man7.org/linux/man-pages/man1/sort.1.html) command with no arguments. More precisely, it should output the lines in lexicographic order according to their ASCII values, which is the default way of comparing strings in Haskell. You can force the Linux `sort` command to sort the same way by setting the environment variable `LC_ALL=C`. For example, if the file [`alouette.txt`](alouette.txt) contains the following text: ``` Alouette, gentille alouette, Alouette, je te plumerai. Je te plumerai la queue. Et la queue ! Et les pattes ! Et les ailes ! Et le cou ! Et les yeux ! Et le bec ! Et la tête ! Alouette ! Ah ! ``` then running `cat alouette.txt | runhaskell Lab1.hs` (or alternatively, compiling `Lab1.hs` and then running `cat alouette.txt | ./Lab1`) should send the following to standard output: ``` Ah ! Alouette ! Alouette, gentille alouette, Alouette, je te plumerai. Et la queue ! Et la tête ! Et le bec ! Et le cou ! Et les ailes ! Et les pattes ! Et les yeux ! Je te plumerai la queue. ``` You might find it helpful to use the standard library function `unlines :: [String] -> String`, which acts as an inverse to the function `lines :: String -> [String]` already mentioned. (By the way, the [hoogle](https://hoogle.haskell.org/) search engine is a good way of finding out about such library functions; `unlines` is the first hit if you search for the type `[String] -> String`.)