an introduction to functional programming using...

An Introduction to

Functional Programming Using Hugs

Peter Wentworth Department of Computer Science

Rhodes University Grahamstown 6140

South Africa

e-mail: [email protected] Last revision: January 2013

Haskell Flakes1

[FADE IN] [A young boy sits at a table in the kitchen. Outside the window we see the morning sun setting in a rich blue sky over rolling green hills and yellow flowers. The young boy’s ravishing mother stands by the kitchen counter, looking busy making bag lunches.]

[Close up of the boy looking dejectedly at his bowl of cereal.]

Boy: "Mom, I’m sick of ordinary cereals! Don’t you have anything new?"

Mom: "Try this, honey!" [hands the boy a colorful package, the words Haskell emblazoned across the front and top, over a large lambda logo]

Boy: [after pouring the cereal flakes into his bowl and cramming a spoonful into his mouth] "Mmm! What is it?"

Mom: "It’s called Haskell! The ingredients are all natural! No chemical additives!"

Boy: "And it tastes great too!" [rotates his baseball cap back-to-front and with a look of determination starts shoveling the flakes greedily down his throat]

[Pan to Dad walking in, suit jacket draped over one arm and still fumbling boyishly with his tie.]

Dad: "Gosh! Something sure smells good this morning! " [sniffs experimentally at the Haskell flakes, then notices the box of Haskell cereal sitting prominently in the middle of the table; lifts the box and peers at it; suddenly his eyes widen with recognition] "Hey! Is this that Haskell I’ve been hearing about? "

Mom: "Yes, it’s new and improved!" [takes the Haskell box and taps one finger conspicuously at the place where a large "1.4" is visible]

Dad: "Wow, count me in too!" [forgets about his tie and slips quickly into a chair at the breakfast table; lifts an empty cereal bowl expectantly in Mom’s direction]

Mom: [smiles as if she knew this would happen, then steps forward, holds the box of Haskell in both hands next to her glowing face and looks straight at the camera] "Haskell! A great way to start the morning!"

Dad: [in the background, whining] "Honey!"

Mom: [now laughing, shakes her head as if to say ’tsk, tsk’, and hurries back to the table to serve her hubby a fresh bowl of Haskell flakes] [FADE OUT]

These notes are Copyright (c) 2013, EP Wentworth

1 Shamelessly pinched from comp.lang.ml

Table of Contents Haskell Flakes ..................................................................................................................................................... 2

1 Introduction ................................................................................................................................................ 7

2 Getting Started ......................................................................................................................................... 8

2.1 Simple Definitions.................................................................................................................. 8

2.2 Basic Types .............................................................................................................................. 9

2.3 Type signatures ...................................................................................................................... 9

2.4 Lists and Strings .................................................................................................................. 10

2.5 Comments................................................................................................................................. 11

2.6 Program layout and the offside rule .............................................................................. 12

2.7 Patterns .................................................................................................................................. 13

2.8 Exercises ................................................................................................................................ 17

2.9 Key Points ............................................................................................................................... 17

3 Polymorphism, and Type Classes ......................................................................................................... 19

3.1 Introduction .......................................................................................................................... 19

3.2 A Problem ............................................................................................................................... 20

3.3 Type Classes .......................................................................................................................... 21

3.4 In Practice ............................................................................................................................. 22

3.5 Exercises ................................................................................................................................ 24

3.6 Key Points ............................................................................................................................... 25

4 Progess Through Recursion ................................................................................................................. 26

4.1 Next Examples ...................................................................................................................... 26

4.1.1 Appending two lists ................................................................................................... 26

4.1.2 take and drop ............................................................................................................ 26

4.1.3 Number Conversions .................................................................................................. 27

4.1.4 Reversing a list - try 1 .............................................................................................. 28

4.1.5 Styles of Recursion ................................................................................................... 29

4.1.6 Forward Recursion ..................................................................................................... 30

4.2 Local Definitions................................................................................................................... 31

4.3 Examples ................................................................................................................................. 32

4.3.1 Greatest Common Divisor ........................................................................................ 32

4.3.2 Fibonacci Numbers .................................................................................................... 33

4.4 Exercises ................................................................................................................................ 34

4.5 Key Points ............................................................................................................................... 34

5 Type Synonyms and Tuples .................................................................................................................. 36

5.1 Type Synonyms ..................................................................................................................... 36

5.2 Tuple Types ............................................................................................................................ 36

5.3 Timeout ................................................................................................................................... 38

5.4 Exercises ................................................................................................................................ 39

5.5 Key Points ............................................................................................................................... 40

6 Generators, Filters and List Comprehensions................................................................................. 41

6.1 Generators ............................................................................................................................. 41

6.2 Filters ...................................................................................................................................... 41

6.3 map............................................................................................................................................ 42

6.4 List Comprehensions............................................................................................................ 42

6.4.1 Multiple generators ................................................................................................... 43

6.5 Discussion and Examples .................................................................................................... 43

6.6 Exercises ................................................................................................................................ 46

6.7 Key Points ............................................................................................................................... 47

7 Data Types ................................................................................................................................................ 48

7.1 Data Declarations ................................................................................................................ 48

7.1.1 Simple multi-field records ...................................................................................... 48

7.1.2 Towards Abstract Data Types .............................................................................. 49

7.1.3 Polymorphic data types ............................................................................................ 49

7.1.4 Multi-case types ......................................................................................................... 50

7.1.5 Recursive data types - Binary Trees ................................................................... 51

7.1.6 Visiting all the nodes in a binary tree.................................................................. 52

7.2 Huffman Encoding ................................................................................................................ 53

7.3 Exercises ................................................................................................................................ 57

7.4 Key Points ............................................................................................................................... 59

8 Lambda Expressions ............................................................................................................................... 60

8.1 Lambda Expressions ............................................................................................................ 60

8.2 Towards a Query Language ............................................................................................... 61

8.3 Reflection ............................................................................................................................... 62

9 Program Transformation ...................................................................................................................... 63

9.1 Some Laws of Haskell ......................................................................................................... 63

9.2 Structural Induction ........................................................................................................... 63

9.3 Equivalence of Functions.................................................................................................... 66

9.4 Exercises ................................................................................................................................ 67

9.5 Key Points ............................................................................................................................... 68

10 More about Functions ..................................................................................................................... 70

10.1 Some Notation ................................................................................................................. 70

10.2 Infix Operators............................................................................................................... 70

10.3 Currying .............................................................................................................................. 71

10.4 Sections ............................................................................................................................. 72

10.5 Examples ............................................................................................................................ 72

10.6 Function Composition ..................................................................................................... 73

10.7 Exercises ........................................................................................................................... 74

10.8 Key Points .......................................................................................................................... 75

11 List Operators .................................................................................................................................. 76

11.1 The Basic List Operators ............................................................................................. 76

11.2 Examples of the List-Operator Style ...................................................................... 77

11.3 Homomorphisms ............................................................................................................... 79

11.4 Listless Style.................................................................................................................... 81

11.5 Exercises ........................................................................................................................... 82

11.6 Key Points .......................................................................................................................... 83

12 Lazy Evaluation ................................................................................................................................. 84

12.1 Introduction ..................................................................................................................... 84

12.2 Unbounded Objects can Simplify Matters ............................................................. 85

12.3 Laziness is a Decoupling Mechanism.......................................................................... 86

12.4 Discussion and Examples ............................................................................................... 87

12.5 Exercises ........................................................................................................................... 90

12.6 Key Points .......................................................................................................................... 92

14 Stateful computation through Monads ..................................................................................... 93

14.1 We need stateful computation too! ........................................................................... 93

14.2 How did they do that? ................................................................................................... 93

14.3 The type (IO a) is a Monad .......................................................................................... 95

14.4 Examples ............................................................................................................................ 96

14.4.1 Hello! .............................................................................................................................. 96

14.4.2 List a file to the console .......................................................................................... 96

14.4.3 Copy a file from src to dst ..................................................................................... 97

14.4.4 List all lines containing some word ... ................................................................... 97

14.5 Further reading ............................................................................................................... 98

14.6 Key Points .......................................................................................................................... 98

15 Circular Structures ......................................................................................................................... 99

15.1 Introduction ..................................................................................................................... 99

15.2 You’re Into a Time-Warp .............................................................................................. 99

15.3 Prime Numbers Revisited ........................................................................................... 100

15.4 Hamming Numbers ........................................................................................................ 100

15.5 Fibonacci Numbers ....................................................................................................... 101

15.6 Discussion ........................................................................................................................ 102

15.7 Exercises ......................................................................................................................... 103

15.8 Key Points ........................................................................................................................ 103

16 A Broader Picture, and Further Readings .............................................................................. 104

16.1 The Style ......................................................................................................................... 104

16.2 The Reason ...................................................................................................................... 105

16.3 The Warning ................................................................................................................... 109

16.4 The Reading .................................................................................................................... 110

17 Bibliography ...................................................................................................................................... 112

18 Some Useful Haskell Functions .................................................................................................. 117

19 Index.................................................................................................................................................. 125

7

1 Introduction "Much more interesting than answers are different points of view, new attitudes, opinions, that eventually lead to a general truth." John Sculley, Former CEO Apple Computers, in his book Odyssey, Pepsi to Apple.

Functional programming is different!

There are four main kinds of computer languages - imperative, object-oriented, logic and functional. Each has its own style, and its own particular ideas about how complex computation can best be organized and expressed. Each style is a different perspective, rather like wearing tinted spectacles of different colours. Each style has its own strengths and weaknesses.

Many of the tricks and techniques you already know from object and imperative languages like Java or C# won’t immediately fit this different way of thinking. But you will get the most benefit out of studying this way of doing things if you can let go of the idea that you already know how to do it some other way, and welcome the opportunity to get a new perspective.

Functional programming is not as commercially successful as the object-oriented techniques - but don’t despair on this score either. Almost all functional programmers are really good at conventional languages too. And many of the newer features creeping into languages like Java, C# and Python are directly based on features from the Functional Programming world.

We’ve also see the recent emergence of mulit-paradigm languages – F# and Scala are two noteworthy examples.

What you’ll learn in this style will hopefully have a deep, lasting and beneficial influence on the way you think about all programming, in any computer language.

These notes introduce Haskell.

Enjoy!

8

2 Getting Started

2.1 Simple Definitions Our first example introduces a function to compute factorials: 5 factorial is 5×4×3×2×1, and is often calculated recursively, i.e. 5 factorial can be computed as 5 × (4 factorial).

A Haskell program consists of one or more separate modules, with each module containing a number of declarations and definitions. The declaration here gives the type information for function fact, specifying that it maps an Integer argument into an Integer result. The definition gives the rules for computing fact.

Functions may call each other recursively, either directly (fact calls itself) or indirectly (f calls g (and h ...) which eventually calls f again).

There is only one way to form more complex expressions from simpler ones in Haskell: apply a function to some arguments.

A function is applied to its arguments by simply writing the function name followed by the arguments. Unlike most other languages, we don’t enclose the arguments in parentheses, or separate them with commas. So here we show how to call a function with three arguments. replaceAll ’x’ ’s’ "Thix ix eaxy" Operators like + and - are also functions (binary functions, meaning they take exactly two arguments). And, like most other languages, we allow these functions to be written between their arguments. Functions written between their arguments are called infix functions. Functions which are written before their arguments are prefix functions.

Because we don’t use parentheses and commas when calling a function with some arguments, we have to be careful when we mix infix and prefix functions in the same expression. Consider double 3 + 4. Do we double the 3, then add 4, or first add 3 and 4, and then double the result? We need to be clear about the precedence rules. In this case the rule is that function application always binds more tightly (takes precedence over) any infix operator. So the above will be evaluated as (double 3) + 4 and not as double (3 + 4).

Like other computer languages, parentheses can be used to explicitly control the grouping of operators and operands to avoid any ambiguity, and to say exactly what you mean.

fact:: Integer -> Integer -- a declaration fact n = if n == 0 then -- a definition 1 else n * fact (n-1)

9

When you create your own Haskell functions, you can make use of a rich set of definitions which are already provided for you in the standard library (called the prelude). Definitions in the prelude are automatically available to all programs.

We will return to the subject of functions and operators in a later chapter.

2.2 Basic Types Every type name begins with a capital letter. The basic types in Haskell are

• Int — e.g. 123, 146

• Integer — this is an unbounded size integer, e.g. 419284701298347126353264528715243745128374615234

• Float and Double — (Double carries better precision). e.g. 3.1415926, 10.0e-4

• Char — Single Ascii characters e.g. ’a’, ’b’, ’?’

• Bool — has only two possible values False, True

• functions. Prefix functions like fact and cos are always named with identifiers that begin with a lowercase letter. Infix operators have names like +, -, *, == and so on. Many of the common infix operators are already defined in the prelude.

All functions, whether they are infix or prefix, have exactly the same status in the language. They are also first-class, and can serve in the same roles as any other kind of values — i.e. they can be passed as arguments, or returned as the result of another function, or they can occur as elements of a list or any other data structure. (If you don’t think this is different, try to put a function like == in Java or C# into a data structure.)

2.3 Type signatures Every expression (and a function is also a trivial expression) in Haskell has a type signature which describes its possible values. For example, the value False has type Bool in Haskell. We write a double colon (which we pronounce as “has type”) after a value to denote its type. For example, 123::Int or ’a’::Char.

One of the interesting ideas in Haskell is that the type signature of a function (or any other kind of value) is separated out from the definition of the function. In other languages the types of the parameters are usually part of the function definition.

The function odd tests if its integer argument is odd, and returns either True or False. Its type signature is odd:: Int -> Bool The arrow in the type signature is pronounced as “to”, so when we read the type signature, we’ll say “This function has type Int to Bool”, or “it maps an Int to a Bool”.

Functions which expect more than one argument are declared as though they take their arguments one at a time. Thus the type signature for mod (which finds the

10

remainder after dividing its first argument by its second) is mod:: Int -> (Int -> Int) The symbol -> associates to the right, so the parentheses can be (and usually are) omitted. The declaration indicates that if mod is given only one argument, it returns another function which in turn expects one more argument and yields an Int result. This idea that a function can be applied to too few arguments (or applied to one argument at a time) and it can yield a result which is another function is called Currying the function, and will be discussed further in a later chapter.

2.4 Lists and Strings Programming languages consist essentially of two types of components: control mechanisms and data mechanisms. Control mechanisms are things like sequencing, looping, function calls, and conditional branching. Data mechanisms are your arrays, lists, trees, strings, and so on.

Often the elegance and power of a language comes from a good match between these. One of the strengths of list-processing languages like Haskell is this coupling: function calling with recursion is used as the key control mechanism, and we tend to nearly always operate on recursively defined data structures. This often leads to a direct correspondence between the form of the data and the form of the algorithm – and we’ll try to push this correspondence in these notes as being a Good Thing.

The primary built-in data structure in Haskell is the list. A recursive definition of a list of elements is

• A list is either an empty list (written as []), or

• a list is a pair — a head element of some type, and a tail which in turn is a list of zero or more elements of the same type.

(You did spot the recursion there, didn’t you? A list .... has a tail, which is a list ...)

Non-empty lists are constructed by using an infix data constructor :, pronounced ”cons”, which associates to the right. Using our recursive definition, [] is a list, 3:[] is a list, 2:(3:[]) is a list, and so on. Because : associates to the right, the parentheses can be omitted.

The elements in a Haskell list must all be of the same type.

Operations that build data structures are called data constructors, or more simply constructors to distinguish them from normal functions. The names of infix constructors must always start with a colon symbol, and the names of prefix constructors always start with a capital letter.

The list bracket syntax [1,2,3] is a shorthand for 1:2:3:[], and is equivalent.

The function null can be applied to any list to test if the list is empty.

The functions head and tail can be used to select the two parts of a list, although a more elegant pattern mechanism will be introduced shortly. For example, head [5,4,3] yields 5, while

11

tail [5,4,3] yields [4,3].

The following identities between head, tail and : always hold. It is an error to apply head or tail to an empty list.

Haskell uses a shorthand notation for lists of characters: they can be entered as strings, enclosed in string quotes. Strings may contain escape sequences. To capture the notion that a string is a list of characters, the system acts as if the following type synonym declaration is in force: (In type signature declarations, the square bracket notation is pronounced list of.) type String = [Char] The string "sammy" can therefore also be written as [’s’,’a’,’m’,’m’,’y’] or it can be written as ’s’:’a’:’m’:’m’:’y’:[].

Since a string is shorthand for a list of characters, we have

head "sammy" → ’s’ tail "sammy" → "ammy" (The arrow in the notation tail "sammy" → "ammy" is not part of the program. It is meta-notation used in these notes to show that when tail is called with the argument "sammy", it returns the result "ammy".)

Here are some more examples of string literals: "This is a string.\n" "This is how we put a quote \" inside a string." This example counts the number of characters in a string. The key to the formulation comes from examining the data structures involved. In the definition of a list we note that there are two cases: either the list is empty, or it comprises a head element and a recursive list structure. Making the algorithm match this view of the data structure leads to the solution.

2.5 Comments There are two kinds of comments in Haskell. The –- token introduces a line comment that ignores everything to the end of the line. Block comments are enclosed between matching sequences {- and -}. Block comments may be nested.

leng:: String -> Int leng s = if null s then 0 else 1 + leng (tail s) leng "hello" → 5

head(a:b) = a tail(a:b) = b

12

2.6 Program layout and the offside rule In order to group declarations together, Haskell doesn’t use keywords like begin and end or curly braces and semicolons like C# and Java do. Instead, Haskell uses the layout of your program to save the bother of begin and end. Here is a let construct (which we have not covered yet) where three sub-definitions are created. As a human reader, it is pretty obvious where we want each of the sub-definitions to start and to finish - and we don’t need those pesky semicolons and braces.

There are a few points to note:

• Sensible layout is used instead of semicolons and braces. This code creates three subdefinitions (for a function called area, which has a single parameter r, a constant called pi and another constant caled e) in order to compute the area of e.

• The first definition after the let keyword starts in a particular column (in this case the column in which the definition of the function area starts). This becomes the offside column for any definitions embedded in the let. This is the column containing the first letter ‘a’ in the function name area.

• The first layout rule is that any part of a definition must stay to the right of its offside column. The compiler knows that it has found the end of the current definition when it finds something offside. In this example, the parameters and body of the area function must be written to the right of the offside column.

• The other layout rule is that any other definitions at the same “level” as area must begin in the same offside column.

• The let construct (like all others) has its own scope, defined by its own offside column within its containing part of the program. When your program text next starts to the left of, or on the same column as, the outer offside column, the compiler knows that the let construct has ended.

• Comments are exempt from the offside rule, so they can start in any column without closing off or ending the current definition.

... let area r = pi * r * r pi = 3.1415926 e = 2.7 in area e ...

13

2.7 Patterns The definition of leng in the previous section can be further improved by noting that it consists of two cases: one for the empty list and one for the non-empty list. Haskell allows us to specify these two cases in separate legs, rather than use an explicit conditional construct. This further serves to reinforce the correspondence between the form of the data and the form of the algorithm. The following example to sum the elements of a list illustrates the use of multiple legs:

This definition consists of two legs. The parameters of sum which occur on the left hand sides of the legs are called patterns, and are pattern-matched to the arguments. (We use the term arguments to denote the values passed by the caller to the function, while we reserve the term parameters or patterns to denote the formal (dummy) parameters in the legs.) In this example, the first leg will be selected only if the argument is []. The second leg is selected whenever the actual argument is not [], whereupon its head and tail components are named x and xs respectively.

In practice, observe these tips: • Behind your back, the compiler uses your layout to insert some extra

semicolons and curly braces into the program. But if you get the layout wrong, a later stage of the compiler might complain that it found a semicolon where it wasn't expecting your definition to end. You won't be able to see the semicolon it is complaining about, because it doesn’t exist in your code at all! Just grin, think hard about “Was this really a clever choice on the part of the Haskell designers?”, and “Is using layout the bad idea, or is it just a poor implementation of a good idea that is producing these obscure error messages?” When you've made your mind up, go ahead and fix your layout nicely.

• When editing and putting your function definitions into a file, always start each new definition at the leftmost column. If you don't, you are likely to have offside problems with respect to other definitions in the same file.

• Avoid tabs. If your editor and the compiler have different ideas about how many spaces a tab character represents, things can get messy.

• Use a monospaced font – one in which all characters are the same width.

14

Up to this point, the syntax of patterns is a proper subset of the syntax of general expressions: the most trivial patterns are identifiers and constants, and these can be built into more complex patterns using data constructors. (The only constructor encountered so far has been the list constructor, but this will be remedied in due course! ) Note that identifiers are trivial patterns, so that xs is a pattern in a definition like leng xs = ... Patterns have two roles in Haskell. They provide a selection mechanism to choose the appropriate leg of a definition. They also serve to split up the argument into its constituent components, and to name these fields. This means that there is no longer any need for “field selectors” like head and tail. Taking the list argument apart is coupled to the calling mechanism, and uses the same syntax as was used to construct the list in the first place. The same notation can either be used on the right-hand side of an equation, in which case it constructs an object from its components, or on the left-hand side in a pattern, in which case it takes the structure apart and names the individual components.

Novice users should take careful note of these two different, but related uses of the constructors. The notation (x:xs) in an expression on the right-hand side of an equation means “construct a new list with head x and tail xs”. The same notation in a pattern, on the left-hand side of an equation, means “expect a list, and take it apart, call its head x and its tail xs”.

In the example above the two patterns cater for all possible arguments, so there is no possibility of error.

More features related to patterns and function legs are available.

• A patterned leg may contain one or more guarded clauses. A guard is introduced after the patterns by a vertical bar. The keyword otherwise may be used in place of the last guard to improve readability:

• Patterns may contain literals. A literal in a pattern must match the calling

argument, otherwise the leg is not selected. Here is a definition of fact which uses a literal pattern for the base case:

abs:: Int -> Int abs n | n >= 0 = n | otherwise = -n

sum:: [Int] -> Int sum [] = 0 sum (x:xs) = x + sum xs sum [1,2,3,4,5] → 15

15

• An underscore in a pattern is a wildcard which matches any argument. It is

used as a placeholder when we have no particular interest in the value of the argument.

• Alias patterns, sometimes called as-patterns, allow the programmer to view an argument in two distinct ways: sometimes as a whole, sometimes decomposed into its component parts. Alias patterns are always of the form id @ pattern where the identifier names the whole object, and the pattern deconstructs it into components. The syntax does not allow anything except an identifier on the left of the alias symbol. This example removes adjacent duplicates from a list. It uses both the underscore and alias features in the pattern.

• The pattern like x:b@y:z might appear ambiguous – is b an alias for y, or for

y:z? Knowing the precedence rules will help us again. The alias symbol @ binds tighter than the cons operator (:), so the above pattern, when rewritten with explicit parentheses, is the same as x:(b@y):z. That makes it more obvious that b is an alias for y, not for y:z.

• No variable may occur more than once in a pattern. Such patterns are called linear. (Some functional languages, notably Miranda, allow non-linear patterns.)

• When a function is called, it may be the case that zero, one or many of the legs can match the caller’s arguments. In general, Haskell functions are partial, which means that a function need not be defined for all possible values of its arguments: if none of the legs match, the function is undefined for that particular set of arguments, and an error occurs. For example, fact may not be defined for negative arguments.

• If more than one leg matches the arguments, the legs overlap. To avoid ambiguity, we must say which leg will be used if more than one could be used. In Haskell the legs and clauses of multi-leg, multi-clause definitions are tested top-to-bottom, that is in the order in which they occur in the definitions. The system goes top-to-bottom, and the first leg that succeeds is chosen. When we write functions with overlapping patterns, we’ll always arrange the legs so that the later patterns are more general than the early ones.

• Legs and patterns which cannot possibly fail (e.g. a simple variable name) are said to be irrefutable.

remdups:: [Int] -> [Int] remdups (x:xs@(y:_)) | x == y = remdups xs | otherwise = x:remdups xs remdups xs = xs

fact:: Int -> Int fact 0 = 1 fact n | n > 0 = n * fact (n-1)

16

Note carefully the difference between the usage of [a] in a type signature, where the meaning is "a list of elements of type a" (more details are coming in the next chapter), and the same notation, [x] used in a pattern, where it means "matching only a list containing exactly one element, which we will call x".

This is not a complete list of the tricks and power of Haskell patterns: there are a few other mechanisms which we don’t cover in these notes.

The following table gives some examples of parameter patterns, and corresponding arguments. Each attempt can either succeed or fail. In the case of success, identifiers in the pattern obtain the bindings shown.

Function Pattern Caller’s Argument Resulting bindings

x 0 x ← 0

0 0 none (succeeds)

(x:y) [1,2] x ← 1 y ← [2]

(x:y) [“sam”] x ← “sam” y ← []

(x:y) “sam” x ← ‘s’ y ← “am”

(1:x) [1,2] x ← [2]

(1:x) [2,3] (fails)

(x:_:_:y) [1, 2, 3, 4, 5, 6] x ← 1 y ← [4,5,6]

[] [] none (succeeds)

[x] [“sam”] x ← “sam”

[1,x] [1,2] x ← 2

[x,y] [1] (fails)

x@y 0 x ← 0 y ← 0

17

a@(x:b@(y:z)) [1,2,3,4] a ← [1,2,3,4] x ← 1 b ← [2,3,4] y ← 2 z ← [3,4]

2.8 Exercises 1) Do these patterns and arguments match? If they do, what bindings result?

• (x:y:z) [1,2,3,4] • [x,y,z] [1,2,3,4] • w@(x:y@z) [1,2,3,4]

2) Write a function which converts a day number (0 to 6) into a day name “Sunday”, “Monday” ...

3) Write a function to find the minimum value in a non-empty list.

4) Write a function which inserts an element into an ordered list in the appropriate position. Return the new list.

a. How many list constructor operations (i.e. how many uses of the operator (:)) will be needed (on average) to insert a random element into a list of length n?

b. What order (i.e. what is the Big Oh formula) for the cost function for insert?

5) A list can be sorted by inserting its first element into the sorted tail.

a. Write a function which sorts a list of items.

b. Which sequences of inputs will give the best-case and worst-case performance for this sort?

c. What is the order of the average cost function?

2.9 Key Points • A function declaration provides the type signature for a function.

• The definition gives the rules for computing a function.

• Basic types include Int, Integer, Char, Float, Double and Bool.

• The key data structure is the list.

• Strings are lists of characters, with a shorthand syntax.

18

• Patterns are used to decompose arguments to functions, and to select the appropriate leg.

• Each leg may contain many clauses, each with a boolean guard.

• Making the form of the function match the form of the data leads to clearer programs.

• Haskell uses program layout to group items.

• Haskell is case sensitive, and there are a number of different kinds of entities that you need to be clear about, and we classify these depending on what symbol or letter, upper or lowercase, the token or name starts with. (A token is a grouping of characters that are treated as a single entity by the compiler. ++, 123.27, "sam" and cos are four tokens in Haskell.)

o Functions like sqrt, length and any functions that you write yourself must always start with a small letter. They are prefix functions, i.e. they occur to the left of their arguments.

o Some functions are called operators — they are almost always binary functions (they take two operands) and are written between the operands. e.g. 6 + 12. Their tokens always begin with some special character other than the colon. These tokens can be more than one character, e.g. the ++ operator which we’ll meet in detail in the next chapters.

o A data constructor builds a data structure or data value of some sort. They too can be prefix or infix. All infix constructors must begin with a colon (:). We have only met one infix data constructor for building a new list from an element and an existing list, e.g. 23:[1,5,6,4]. Constants are a special case of data constructors, so we have also seen True, False, and []. Any alphabetic data constructor name must start with a capital letter (e.g. see Node or Leaf later in these notes). Data constructors can be used in patterns, whereas functions can not.

o Type names (Int, Bool) and module names always start with capital letters.

2.10 Built-in functions you should know See the list of useful Haskell functions near the end of these notes. But at this time, you should know at least the basic functions that are available, like even, odd, abs, mod, div, sqrt, sin, cos, tan, pi, head, tail, :, null, length, sum, &&, ||, not, <, >, <=, >=, ==, /=, and the usual arithmetic operators.

19

3 Polymorphism, and Type Classes

3.1 Introduction The tools in the previous chapter allow us to build this function to compute the length of a string.

Unfortunately, this definition cannot be used to count the length of a list of numbers, since its type is specifically declared to accept only strings. However, the definition in no way depends on the fact that the elements in the lists are characters. If we change the type declaration, we can use the same definition to find the length of some other kind of list, say a list of numbers.

Good programming practice and code re-use suggest that we should not have many different versions of the length function, one for lists of one type, one for lists of another type...

What is needed is to be able to specify that length operates on a list of “any type”. Haskell does this by using type variables in the declaration.

To generalize length so that it becomes polymorphic (i.e. many-typed) we rewrite its type signature, using a type variable, like this:

Here the type variable b stands for “any type”, and the modified version of length is read as “maps a list of any type into an Int”. Type variables always start with lowercase letters. You can choose any name you like. But conventionally, we’ll use the single-letter type-variable names a, b, c ....

Types like Char, String, Int, Integer, [Int] are called concrete types – we know the type fully, and there is no polymorphism. So take note that although the function declaration is polymorphic, at any site in your program where this function is used, we must be able to tell exactly what concrete type the type variable b stands for at that site. For example, in the expression length "sammy" the type variable b would be instantiated to (substituted by) the concrete type Char. So at this site, length has the type [Char] -> Int. In the expression length [True, False, False] the type variable would be instantiated to Bool, and in the expression length [[1,2],[7,9],[3,8]], the type variable b would be instantiated to [Int]. So now you should be able to say what the type signature of length is at this site.

Polymorphism is not only useful for lists of elements of any type. In the next example, either the second or the third argument will be returned, depending on the value of the first. selectOne can operate on second and third arguments of any type (as long

length:: [b] -> Int

length:: String -> Int length [] = 0 length (_:xs) = 1 + length xs

20

as both are the same type).

With the notation for polymorphism at our disposal we are now in the position to give the type signatures of the list constructor and selectors seen previously:

If you are using the Hugs interpreter as your implementation of Haskell, at the command line interface you can query the type of any expression (say head) using :t head. You will get a type signature like the one shown above. (In the WinHugs system, use the Browse menu option for a more comprehensive look at what data constructors and functions are defined in Hugs, and what their type signatures are.)

Programming languages should be able to abstract smoothly over details that are not relevant, and should allow us to “factor out” and concentrate on the essential aspects of the problem.

The first-class status of Haskell functions (meaning that they can be put into lists, and passed as arguments to other functions) and the type-polymorphism are essential tools to help this process of “factoring out the constant parts”.

In the example above we have captured the essence of a length-finding algorithm. The polymorphism lets us solve the problem once, for lists with any types of element. Any language that forces you to code the same essence of the length function over and over again for different kinds of list, is failing to provide good abstraction tools. (As an eye-opener in your favourite language, write a single procedure to reverse the elements in an array. See whether you can make one procedure that can work for arrays of integer, and strings, and doubles, and student records.)

3.2 A Problem Unfortunately, this mechanism is too general. It allows definitions of functions that can be used with all types (by using a type variable), or only with one type (by using some specific type).

Consider these two functions. The first one tests if an argument is present in a list, and the second one finds the minimum element in a non-empty list:

head:: [a] -> a tail:: [a] -> [a] (:) :: a -> [a] -> [a] [] :: [a]

selectOne:: Bool -> a -> a -> a selectOne True x y = x selectOne False x y = y

21

With the knowledge at our disposal so far, the type signatures would be as shown below. These signatures say that the functions can work for all types a. But there are unwanted assumptions here, namely that we are able to compare all types for equality or for order. (Because the definitions use (==) and <.)

In practice, not all types of objects are comparable, either for equality, or for order. It is not meaningful to ask whether the function sin is somehow “less than” the function cos. Other types that may not possess obvious order are complex numbers, windows, URLs, bitmaps, trees, or stacks.

So our problem is this: we want polymorphism, but until now we can only specify one concrete type, or all possible types. We need to be able to say “usable for some types, but not for all”. We need a way to group our types, and then say “works for any type in this or that group”.

So how do we write a polymorphic sort routine, but restrict it so that we cannot use it to try to sort lists of bitmaps?

3.3 Type Classes Haskell provides a very general solution to this problem via the idea of type classes. Any number of types can be grouped in a class. A type can belong to many classes.

Then, we are able to restrict the type signatures so that they are polymorphic, but only usable for types within some groups. These restrictions are written like this:

The notation (Eq a) => is called a context, and limits the use of the function to those concrete types which “belong to the Eq group”. The Haskell terminology is a bit more rigorous — we say that the type variable a may only be instantiated to those concrete types which are instances of the class called Eq.

To join the class (i.e for a type to be an instance of) Eq, a concrete type must define at least two member functions, (==), (/=). A type such as Integer or Tree can be made an instance of Eq by supplying appropriate definitions of these two functions.

elem :: (Eq a) => a -> [a] -> Bool minimum:: (Ord b) => [b] -> b

elem :: a -> [a] -> Bool - WRONG minimum:: [a] -> a - WRONG

elem e [] = False elem e (x:xs) = if (e == x) then True else elem e xs minimum [x] = x minimum (x:xs) = if x < minimum xs then x else minimum xs

22

What this really tells us is that any type that is in class Eq will always have a way of testing elements for equality.

And provided type a can be tested for equality, elem can be used, and then it maps a -> [a] into Bool.

The class Ord can be extended by adding other types which can be ordered (i.e. for which functions <, <= and so on) have been defined. The definition of minimum above restricts its use to lists of elements which can be ordered with respect to each other.

In Haskell, types can belong to one or more classes. If you inspect the code in the Prelude, the class keyword introduces the class definition which dictates what functions its member types will implement. (The class can provide "default" implementations of some of its functions.) The instance keyword makes a specific type a member of a class, and provides the implementations to satisfy the class definition.

The Haskell terminology is confusingly different from C# or Java where objects are manufactured or constructed according to class specifications. What Haskell calls a class is more like the Java or C# interface – some abstract promises of functions that will be available - and in Haskell, a type can only become an instance of a class if it supplies the implementations to fulfil the class promises.

3.4 In Practice Type signatures in Haskell are optional. The compiler always infers (works out) the type signature for a function, and compares it to one that you might have supplied.

An error occurs if you do supply a signature which is more general than the one the compiler infers for you, (i.e. if you claim your function can work for any types, but the compiler knows that it cannot.) This usually happens because you use one of the comparison operators, or because you call some other function that has a class restriction.

One way to avoid this problem is not to provide any explicit signatures, and the other is to provide accurate the context restrictions in any signatures that you write.

It is not an error for the programmer to be too cautious: for example, you can specify via a type signature that your version of the length function can only be used on strings. The compiler will honour your restriction, even if it can spot that your code is general enough so that it could work for other lists too.

As a beginner, there are only a handful of contexts that will be of concern:

• (Eq a) => ... Restrict a to types which have equality defined on them. Having equality (==) also implies that you can test for inequality (/=).

• (Ord a) => ... Restrict a to those types which can be ordered. Ord promises the operations <, <=, >, >=. Objects that belong to class Ord must also be members of class Eq, so equality is also defined.

23

• (Num a) => ... Restrict a to types in class Num. This class generally includes any kinds of numbers (Int, Integer, Float, Double, ...). For example, the following signature is typical: (+)::(Num a)=> a->a->a

• (Integral a) => ... The two built-in types, Int and Integer, are instances of this class. These are the types for which the functions like mod, div, rem, odd, even are defined.

• (Show a) => ... All types which are instances of this class have an “external” representation for their objects, i.e. they can be converted to a string by using the show::(Show a)=> a->String function.

Most of the common types like Int, Integer, Float, Double, String, [Int] are members of class Ord and, by implication, also members of Eq.

In Haskell, one also uses very general instance declarations for constructors. This allows one to say “Provided type a is orderable/showable/integral, then we’ll allow [a] (a list of a) to also be orderable/showable/integral. So without any new instance declarations, since Float is in Ord, Haskell also knows how to order [Float] and therefore also [[Float]] and also [[[Float]]] ..., and they’re all in Ord.

24

3.5 Exercises You can check your answers to many exercises by using your compiler. In the case of Hugs, the interactive command :t will show the type of any expression. If your function is already loaded, a command like :t head will give its type. But the easiest way to use this interactively (without having to first type the function definition into a file and load the file) is to “wrap” the function in a let statement, e.g. for the first exercise below, you can check the answer by typing this at the Hugs prompt: :t (let f y z = y < z in f)

1) Give the most general type signature for the Haskell function f y z = y < z

2) Give the most general type signature (or say if there is an error) for f x y z = (x < y) && (y < z)

3) Give the most general type signature (or say if there is an error) for f x y z = x < 3 && y < z

4) Give the most general type signature (or say if there is an error) for f y z = True

5) Give the most general type signature (or say if there is an error) for f x y z = if x then y else z

6) Will these work? Why or why not? f1, f2, f3, f4 :: (Eq a) => a -> a -> Bool f1 x y = x == y f2 x y = x < y f3 x y = False f4 x y = x /= y

f5, f6, f7, f8 :: (Ord a) => a -> a -> Bool f5 x y = x == y f6 x y = x < y f7 x y = False f8 x y = x /= y

f9:: (Num x) => x -> x -> Bool f9 x y = x > y -- can all numbers be ordered?

g1:: (Eq a, Ord a, Num a) => a -> a -> a g1 x y = if x < y then x + y else x - y

Give the most general type signature (or say if there is an error) for

7) f x (y:z) = x:y:z 8) f [x] (y:z) = x:2:y:z 9) f x (y:z) = x:y 10) f [x] (y:z) = x:y 11) f (x:xs) = xs++x 12) f (x:xs) = xs++[x]

25

13) f ([x]:xs) = xs++[x] 14) f [x] (y:z) = (x==7):y 15) What built-in concrete types can the function odd be applied to in Haskell?

3.6 Key Points • Type signatures can be parameterized with type variables, so that a single

function can operate on elements of any type.

• When you use a polymorphic function in an expression, the compiler effectively substitutes the actual (concrete) type for the type variable. We say the type variable is instantiated to a concrete type.

• Haskell type classes allow us to group the concrete types.

• Context restrictions limit the polymorphism: when a context restriction is specified, the compiler cannot simply substitute any concrete type for a type variable: it must ensure that the substituted type is a member of the class given in the context restriction.

• The class makes promises that certain functions will be available, and ensures that the type implements these.

• The context (Eq a) => restricts substitutions of a to those concrete types for which equality is defined.

• The context (Ord a) => restricts a to those types for which an order is defined.

• The context (Num a) => restricts a to numbers.

• The context (Show a) => restricts a to types that can be displayed as a String.

• Haskell infers its type signatures, but it is good policy to write your own anyway.

• This is just the introduction. The class system in Haskell is much more powerful than the little we have seen here.

3.7 Built-in functions you should know Get to know these functions from the Haskell standard library: elem, notElem, minimum, maximum, min, max, nub, show.

26

4 Progress Through Recursion “To iterate is human, to recurse divine.”

L Peter Deutsch

4.1 Next Examples This section proceeds with more examples, primarily aimed at improving familiarity with the language, the use of recursion to solve problems, and the case-by-case style of Haskell definitions.

4.1.1 Appending two lists We now present a function which appends two lists together. Guided by the structure of the data, we attempt to formulate the problem in terms of the two possible cases. The formulation question is “how can we append two lists in terms of some operation on the head, and some recursive operation on the tail? ” This formulation has much in common with proofs by induction, so much so that we will refer to the two cases of the definition as the base case and the inductive case. More attention will be given to this relationship later. The solution uses decomposition on the first argument:

The infixr keyword declares the token ++ (pronounced append) to be right associative with binding precedence 5. The type signature indicates that ++ expects two lists, and returns a list of the same type. (The three occurrences of the type variable a must stand for the same thing in any use of ++.) Note that ++ requires O(n) steps to perform its task, where n is the length of its first argument. We say the cost function is order n.

4.1.2 take and drop The next examples are a pair of complementary functions. take retains the leading n elements of a list (or as many as it can if n is bigger than the length of the list).

infixr 5 ++ (++):: [a] -> [a] -> [a] [] ++ ys = ys (x:xs) ++ ys = x:(xs ++ ys) "hel" ++ "lo" à "hello" [1,2,3] ++ [4,5] à [1,2,3,4,5]

27

drop retains those elements that remain after n have been taken. If the list contains less than n elements, the result is [].

Here is an important identity relating take and drop: take n xs ++ drop n xs = xs

4.1.3 Number Conversions Next we consider some problems of converting decimal numbers into other bases. The first converts a non-negative number into a binary representation. The algorithm for converting a number to a target base is to repeatedly divide the number by that base, keeping the remainder at every step. The first remainder which this algorihm yields is the least significant bit, the next remainder is the next least significant bit. The problem has a naturally “recursive” subproblem, in that one can calculate the least significant digit and then recursively do the rest of the conversion.

bin:: Int -> [Int] bin 0 = [] bin n | n > 0 = bin (div n 2) ++ [mod n 2] bin 26 à [1, 1, 0, 1, 0]

drop:: Int -> [a] -> [a] drop n xs | n <= 0 = xs drop _ [] = [] drop n (_:xs) = drop (n-1) xs drop 0 [1,2,3,4] à [1,2,3,4] drop 2 [1,2,3,4] à [3,4] drop 7 [1,2,3,4] à []

take:: Int -> [a] -> [a] take n _ | n <= 0 = [] take _ [] = [] take n (x:xs) = x : take (n-1) xs take 0 [1,2,3,4] à [] take 2 [1,2,3,4] à [1,2] take 7 [1,2,3,4] à [1,2,3,4]

28

If we abstract out the base used in the above example, we can do conversions into an arbitrary base, and treat the binary, octal, decimal or hexadecimal conversions as specific instances of the general case. In this example we return a String instead of a list of Int:

The example also shows a shorthand notation that permits the types of more than one function to be declared with a single signature. Also note the use of the function intToDigit, which is defined in the Haskell package called Char.

Two simple extensions are left as exercises. The result can be padded with leading zeros to return a fixed length result, and negative numbers can be handled by defining a one- or twos- complement function, and introducing another clause in the definition of baseCvt.

4.1.4 Reversing a list - try 1 Consider an algorithm for reversing the elements of a list. Once again we are guided by the form of the data structure, and formulate the procedure in terms of some

import Char -- because it provides intToDigit baseCvt:: Int -> Int -> String baseCvt b n | n == 0 = "" | n > 0 = baseCvt b (div n b) ++ [intToDigit (mod n b)] hex, dec, oct, bin:: Int -> String hex n = basecvt 16 n dec n = basecvt 10 n oct n = basecvt 8 n bin n = basecvt 2 n bin 167 à "10100111" dec 167 à "167" oct 167 à "247" hex 167 à "a7"

29

operation on the head, combined with the result of recursively reversing the tail. This gives the following:

The definition of rev states that the reverse of the empty list is the empty list, and the reverse of a non-empty list is the reversed tail, appended to a list containing the head. ++ requires that both its arguments are lists, which accounts for the list brackets surrounding element x in the second leg.

A trace of the calls to rev would yield something like this: (We make some simplifying assumptions for now about the order in which things get done. This may not correspond exactly to an actual implementation, but more about that later).

How many constructor steps are needed to reverse a list of length k? The steps needed by ++ depends on the length of its first argument. We have noted that result of rev gets built while we are exiting the recursion, as follows: [] à "m" à "ma" à "mas" ...

As each character is added by appending it, the number of steps needed by ++ increases:

0+1+2+3+4...n which is n(n+1)/2 or O(n2) .

This means that the running time of the algorithm is proportional to the square of the number of items: i.e. it will take 9 times longer to reverse a 300 character string than it does to reverse a 100 character string. (Try it!)

4.1.5 Styles of Recursion All the examples so far in this chapter use a technique called backward recursion, because the significant work (building the new list, or doing the addition or other processing of the elements) occurs as the recursive levels are exited. Lists are therefore processed backward: in rev, for example, the last element of the original list is the first one inserted into the new list.

A key idea in backward recursion is “solve the smaller subproblem first, then come

rev "sam" à (rev "am") ++ "s" à ((rev "m") ++ "a") ++ "s" à (((rev []) ++ "m") ++ "a") ++ "s" à (([] ++ "m") ++ "a") ++ "s" à ("m" ++ "a") ++ "s" à "ma" ++ "s" à "mas"

rev:: [a] -> [a] rev [] = [] rev (x:xs) = rev xs ++ [x] rev "sammy" à "ymmas"

30

back to solving the whole problem”.

In an implementation, each call of any function requires a stack frame — a chunk of memory on a run-time stack which holds information about the arguments and how to get back to where you were called from. One disadvantage of recursion over loops is that many recursive calls can create a need for a huge amount of stack space. So your system may fail to execute what appears to be a very simple program. By contrast, looping doesn’t require new stack frames.

If each leg in a body of a recursive function gives rise to at most one call to itself, the function is linear recursive. Linear recursive functions have attracted attention because the recursion can always be removed in favour of a more efficient loop[27]. (We stress that this is an optimization that your implementation should do: not one that we think a programmer should have to do.)

4.1.6 Forward Recursion One particular class of linear recursive functions is very easy to transform into a loop: these are the tail recursive functions. A function is tail-recursive if the last action it performs is a call to itself. Functional programmers often employ a tail-recursive solution in preference to other solutions, because it will generally have much better stack-space usage.

The style is often associated with forward recursion, where the active work gets done as the recursion levels are entered. This involves adding extra accumulating parameters[36], which are introduced into the function to hold the result as it is built up.

A forward-recursive reversal algorithm which is linear-recursive and also tail-recursive is the following:

The definition contains a local definition of a recursive function rev. (More about local definitions in the next section.) reverse initializes the extra accumulating parameter in rev by passing an initial value, in this case []. As rev calls itself recursively it extends this result, until it finally reaches the bottom of the recursion with the complete answer. No list appending is performed in this version of reverse. This accumulating parameter version of reverse has O(n) (linear) performance.

It is not always the case that forward recursion will have a better cost function than backward recursion. In this particular case of reverse, the backward recursive algorithm had to keep putting its elements “at the end” of the work it had done so

reverse:: [a] -> [a] reverse xs = rev xs [] where rev [] ys = ys rev (x:xs) ys = rev xs (x:ys) reverse "sammy" à "ymmas"

31

far. This was the source of the inefficiency. Forward recursion is very natural to use in a case like this: the first item of the argument list is processed and the result is accumulated in the accumulating parameter, which is eventually going to become the result of the whole computation. As the algorithm runs the accumulating parameter automatically builds up in the “back-to-front” order.

So forward recursion would probably be a bad choice if we wanted the result list to be in the same order as the argument list. If we ever find ourselves having to continually work at the “end” of some accumulating answer, the inefficiency would reappear. When we work with linked lists, especially in functional languages where we never overwrite pointers or links, we'll always try to work near the front of the list.

Here are three simple functions, all of which double each of their elements. Work out the cost function of each and how much stack space each needs to run, (under the assumption that your system doesn’t need extra stack space for tail recursion), and see which you think is the best all-round version.

Unlike backward recursion that always tries to first solve a smaller subproblem, forward recursion tends to handle the work of the current recursive call before going down to the next recursive level.

4.2 Local Definitions reverse uses an auxiliary definition to perform its task. The expression is said to be qualified by the local definition(s). Local definitions come in two flavours:

• The let form introduces new auxiliary definitions which qualify an expression. The let can be used wherever a normal expression is valid. The syntax is let <local-definitions> in <qualified-expression>

• The where form is used to introduce new bindings which scope over a number of guarded clauses in a leg of a definition.

The local definitions come into scope simultaneously (i.e. they can all see and call each

doubleAll1 [] = [] doubleAll1 (x:xs) = x*2 : doubleAll1 xs doubleAll2 xs = da xs [] where da [] r = r da (x:xs) r = da xs (r++[x*2]) doubleAll3 xs = reverse (da xs []) where da [] r = r da (x:xs) r = da xs ((x*2):r) doubleAll1 [1,2,3,4,5] à [2,4,6,8,10] doubleAll2 [1,2,3,4,5] à [2,4,6,8,10] doubleAll3 [1,2,3,4,5] à [2,4,6,8,10]

32

other). They can be recursive, and can occur in any order. The expressions within local definitions can contain other local definitions, allowing nested blocks of functions within other functions. Non-local identifiers in enclosing blocks can be accessed.

Local definitions will usually define new functions, but it is also possible to directly use a pattern on the left hand side, and to bind a value to that pattern. Recall that simple identifiers are patterns. The right side will be evaluated and decomposed according to the pattern on the left. No guards or literals are allowed for such decompositions, and a run-time error results if the right-hand side cannot be decomposed to fit the pattern.

In this example, double is a function while ten, a, b and cs are simple identifiers that get bound to some value. The function f expects an even-length list of numbers (can you see where this is required?), and returns a new list by combining those pairs, and multiplying the results by a constant. It will fail for an odd-length list.

Recall that local definitions introduce their own offside column at start of the first definition after the let or where. The definition layouts are subject to the same multi-leg, multi-clause rules given previously. "Sensible" program layout should not give any special problems.

The next example shows the use of the where: Unlike the let, the y and u defined by the where are "in scope" in both the guarded clauses in the single leg.

4.3 Examples

4.3.1 Greatest Common Divisor

This algorithm for finding the gcd was first discovered by Euler (1707--1783), some time before functional programmers were around.

g x | u > 0 = x + y | otherwise = x - y where y = 17 u = x - 3

f [] = [] f xs = let double x = 2 * x ten = 10 (a:b:cs) = xs in double (ten*(a+b)) : f cs f [2,3,4,5] à [100,180]

33

4.3.2 Fibonacci Numbers

The next example finds the n’th term of the Fibonacci sequence 1,1,2,3,5,8...

The algorithm uses the recurrence relation which notes that the n’th number in the series is the sum of the previous two numbers. This algorithm is sometimes criticized as being a bad use of recursion, because the double recursion usually makes it very inefficient to compute. Implementations that can memo[43] the function (remember the arguments and results from previous calls, and use a table lookup for computation) will have no problems with efficiency. Furthermore, this solution accurately reflects the nature of the recurrence relations on which the sequence is based: the alternatives often do not.

In Chapter 5 of Eisenbach[25], Darlington shows how this example can be mechanically transformed into an efficient linear-recursive version.

For those who are duly concerned about efficiency, this version of computes the same function but in linear time. It uses a variation of the accumulating parameter technique: two extra parameters are introduced to hold the two most recent terms in the sequence.

It is very easy to say "here is one function that does something, here is another that also does the same thing". But how can we be sure that they really do compute the same results? One of the exercises later in these notes will require a proof that these two functions are equivalent.

fib n = fib_aux n 0 1 where fib_aux 1 p q = q fib_aux n p q = fib_aux (n-1) q (p+q) fib 20 à 6765

fib 1 = 1 fib 2 = 1 fib n | n > 2 = fib (n-2) + fib (n-1) fib 10 à 55

gcd:: Int -> Int -> Int gcd x y = gcd_aux (abs x) (abs y) where gcd_aux x 0 = x gcd_aux x y = gcd_aux y (x `mod` y) gcd 24 16 à 8

34

4.4 Exercises Many of the examples and exercises are from the prelude. Define your own names so that they do not clash with the prelude names. A study of the prelude will be useful.

1) Consider the number conversion function baseCvt in Section 4.1.3. How many times is it called in order to compute the binary representation of 26? The resulting list is constructed using ++ which in turn uses the more primitive constructor repeatedly. How many cons operations are needed to compute the binary representation of 26? Give the general cost function of baseCvt. What is the order of the cost function? Write a version of baseCvt with a linear cost function. Hint: for any natural n, the result in base b contains logbn digits, where

x denotes the ceiling of x (next integer bigger or equal to x).

2) Which of these functions defined in this chaper are forward-recursive? backward recursive? linear-recursive? tail recursive? leng, sum, fact, rev, reverse, take, drop, bin, gcd, fib, fib_aux. Give a forward-recursive definition for bin.

3) Write your own version of the elem x xs function which tests whether x is an element of list xs.

4) Write a function which returns a list of proper factors of a number. (Proper factors exclude 1 and the number itself.)

5) Use the above function to write a function which tests if its argument is a perfect number. A positive number is perfect if its proper factors sum to one less than itself. Find the first three perfect numbers.

6) Write a function which tests whether its argument list is in non-descending order.

7) Write a function which merges two sorted lists to produce a third sorted list. The cost function of the merge must be linear.

4.5 Key Points • Recursion is abundant, and used instead of looping.

• Backward recursion is often natural, and performs its key tasks while exiting the recursive levels.

• Forward recursion accumulates the answer as it goes into the recursive levels. It usually requires an extra accumulating parameter to hold the partial result. It is most usually associated with processing elements in a list from the leftmost one to the right.

35

• Tail-recursive functions are easy to optimize, and forward recursion tends to create these. Forward recursion is a stylistic trick-of-the-trade for the functional programmer.

• Local let and where definitions improve clarity and modularity of the code. They impose a block structuring on the language, with lexical scope rules for non-local identifiers.

• Local definitions introduce new functions.

• A local definition of the form <pattern> = <expr> can be used to decompose a value via the pattern matching.

• A let definition qualifies a single expression. A where expression qualifies a number of guarded clauses.

4.6 Built-in functions you should know Get to know these functions from the Haskell standard library: ++, take, drop, reverse, gcd, init, last, takeWhile, dropWhile

36

5 Type Synonyms and Tuples

5.1 Type Synonyms Haskell allows type synonyms to be declared using the keyword type. A type synonym acts primarily as a renaming shorthand for a more complex type expression. For example, we have already mentioned that the system uses a type synonym definition type String = [Char] This permits the identifier String to occur in any type expression in place of [Char]. The alternatives are freely interchangeable (i.e. the type system is based on structural equivalence rather than name equivalence). The type synonym definitions can be parameterized. A matrix of arbitrary elements might be represented as a list of lists. This choice of data representation can be captured with a synonym declaration.

5.2 Tuple Types Haskell permits grouping items together. A tuple (some pronounce this as “tuepel” and some as tupple”) is constructed by enclosing the field components in parentheses, separated by commas. (A tuple always contains at least two components.) The tupling notation may also serve as a pattern on the left of a definition to decompose a tuple and name its components.

The tuple with two elements occurs so often that we give it a special name, the pair, and the prelude provides two selector functions on the components of a pair:

As we might expect, there are also functions to combine two lists into a list of pairs, and to split a list of pairs into two lists.

fst:: (a,b) -> a fst (x,_) = x snd:: (a,b) -> b snd (_,y) = y

type Matrix a = [[a]] transpose:: Matrix a -> Matrix a

37

A function must always produce a single result. Tuples are particularly useful when we want a function to return a number of different results – we can bundle the parts together and return one composite value.

For example, doStats might be a function to process a (non-empty) list of a batsman’s cricket scores and to perform some statistical analyses: the number of innings, the sum, the average, the lowest and the highest.

Recursive functions that use tuples have more interesting and tricky patterns than functions that return simple values. In the unzip function above we needed to return the two lists, but because a function can only return a single value, we bundle the lists into a pair. We call unzip ps recursively, obtain a pair of lists, and use a pattern binding to separate the pair into its two component lists, us and vs. The “work” at the current level of recursion conses the new elements u and v onto lists us and vs respectively, and returns a pair of these new results.

We’ll illustrate this pattern in more detail now by returning to the cricket scores. We’ll just solve the simpler problem of returning the minimum and maximum scores from a non-empty list. If we use the cricket example above, we’d have to traverse the list twice: once to compute the minumum, and again to find the maximum. Can we do this with a function that only traverses the list once, but returns both the values we need? (There are built-in functions min and max that will return the smaller (larger) of two values of an orderable type.)

doStats:: [Int] -> (Int, Int, Double, Int, Int) doStats ss = (length ss, sum ss, sum ss / fromInt (length ss), minimum ss, maximum ss) gibbs = [17,5,63,114,12,0,4] doStats gibbs à (7,215,30.71,0,114)

zip:: [a] -> [b] -> [(a,b)] zip (u:us) (v:vs) = (u,v) : zip us vs zip [] _ = [] zip _ [] = [] unzip:: [(a,b)] -> ([a],[b]) unzip [] = ([], []) unzip ((u,v):ps) = let (us, vs) = unzip ps in (u:us, v:vs)

38

There is an even more elegant, one-pass solution for minandmax using forward recursion, but we’ll leave that as an exercise for the reader.

5.3 Timeout What’s the point? You only save one pass over the list, so you probably do half the work, but each function is more complicated. So all in all, you probably lose.

True, for this case. But not in all cases.

Later in the notes we’ll be working with trees rather than lists. An algorithm which has to process each of its sub-trees twice, (and then for those sub-trees each sub-sub-tree gets processed twice) has exponential running time. If you are able to reduce two passes of each subtree to one pass, you don’t just halve your work: you fundamentally reduce the order of the cost function. So we’ll see an example an algorithm that, because it returns a pair of results, can run in linear time, whereas another solution that makes separate calls to compute each result is an O(2n)algorithm — that is when this technique really pays off.

The next example illustrates the use of tuples and synonym types, and decomposition of tuples by pattern matching.

minandmax:: (Ord a) => [a] -> (a,a) minandmax [e] = (e, e) -- base case is a singleton list minandmax (e:es) = let (oldmin, oldmax) = minandmax es in (min e oldmin, max e oldmax) minandmax gibbs à (0, 114)

39

It should be clear that the function younger is a version of min — something that returns the smaller of two. And youngest is analogous to minimum — a function that returns the smallest in a list.

So could we have more simply said this instead???

Firstly, do you think tuples are likely to be orderable? And if so, should all tuples be orderable, or only some? What are the constraints? And if you think they are probably orderable, how are they likely to order these family members? Use the software to check out what Haskell does in this case, and examine the prelude to see whether tuples are orderable.

5.4 Exercises 1. Write a function which passes once over a non-empty list of integers, and

returns a pair containing the sum and the product of the elements. Use backward recursion.

2. Re-do the above exercise using forward recursion. Make it as simple and efficient as you can.

3. Generalize the above so that it works for any type of numbers.

... main = let (name, _, _) = minimum family in "The youngest member is " ++ name

type Person = (String,Int,Bool) -- (name, age, isMale) younger:: Person -> Person -> Person younger p@(_,age1,_) q@(_,age2,_)= if age1 < age2 then p else q youngest:: [Person] -> Person -- find youngest of non-nil list. youngest (p:[]) = p youngest (p:ps) = younger p (youngest ps) family:: [Person] family = [("Johnny", 7, True), ("Sally", 3, False), ("Daddy", 39, True), ("Mummy", 34, False)] main -- extract the name of the youngest = let (name, _, _) = youngest family in "The youngest member is " ++ name main à "The youngest member is Sally"

40

5.5 Key Points • Type synonyms are simply a shorthand for convenience. As such, they

don’t add any new type safety to Haskell.

• A better mechanism for data structures will be introduced in chapter 7.

• A tuple is a multi-field record. Like the [x] notation for lists, the tuple notation is used in three places in Haskell:

i. in type signatures to describe the type of a tuple,

ii. in function code to build tuples, and

iii. in patterns to match tuples and name the fields.

• The pair tuple is so common that some special support exists in the prelude.

• An important use of tuples is to bundle together multiple results so that they can be returned as a single value from a function.

• There is an interesting, perhaps non-obvious, pattern of construction for recursive functions that return tuples.

5.6 Built-in functions you should know Get to know these functions from the Haskell standard library: fst, snd, zip, unzip, zipWith, span, splitAt, words, unwords.

41

6 Generators, Filters and List Comprehensions

In this chapter we begin to look at more powerful ways of manipulating lists of items, under the general heading of list operators. List operators are functions that operate on whole lists at once. What we want to do here is to start you thinking “list-at-a-time” as opposed to “element-at-a-time”.

6.1 Generators The first section looks at generators, starting with a simple case, and illustrating Haskell’s ability to abstract smoothly towards more general cases. Haskell provides a shorthand notation to express a list of values: [1..10] à [1,2,3,4,5,6,7,8,9,10] [7..8] à [7,8] [7..4] à [] [2..] à [2,3,4,5,...] [2,4..10] à [2,4,6,8,10] [2,5,..] à [2,5,8,...] [10,9..1] à [10,9,8,7,6,5,4,3,2,1] These generate an arithmetic progression from some initial value to some (optional) terminating value, with an optional (defaulting to 1) simple arithmetic sequence increment. The bounds need not be literals, but can be arbitrary expressions which are determined at run-time. An increment value (which can also be zero or negative) is inferred if you explicitly provide the second element in the sequence.

6.2 Filters A filter removes certain elements from a stream (list) of input values, and produces a stream of the the remaining items. It therefore maps one list into another sublist.

A list is filtered with respect to some test condition, or predicate. (a predicate function is one that returns a Bool). Here is a possible definition of filter which is provided as a standard function in Haskell. It retains all elements from the stream which satisfy the predicate.

filter:: (a -> Bool) -> [a] -> [a] filter p [] = [] filter p (a:u) | p a = a : filter p u | otherwise = filter p u filter odd [1..10] à [1, 3, 5, 7, 9]

42

6.3 map Another very useful list operator is the map function, which applies a function to every element in a list, and returns a new list as its result. One definition of map is shown here.

6.4 List Comprehensions List comprehensions were popularised by Turner[62], where he called them ZF expressions (from Zermelo-Frankel set theory). They combine generators and filters into a single powerful notation. We demonstrate with a simple example:

A list comprehension always produces a result that is a list. Each element of the resulting value list is produced in turn from the qualified expression part of the comprehension. (The qualified expression is the expression before the vertical bar, in this example, simply the expression x.)

The qualified expression is followed by one or more qualifiers, which are separated by commas. Each qualifier can be one of:

• A generator expression of the form x <- list, which loops over and generates a list of candidate values,

• a filter expression (a predicate which returns Bool), which is used to retain or eliminate candidates.

• a let binding, which introduces a new named value into scope.

The whole list comprehension is always enclosed in list brackets.

The qualified expressions, generator expressions, and filter expressions can all be arbitrarily complex, and can depend on any local identifiers that are introduced by the generating clauses.

[ x | x <- [1..10], odd x] à [1,3,5,7,9]

map:: (a -> b) -> [a] -> [b] map f [] = [] map f (a:u) = f a : map f u map sqrt [1..4] à [1.0, 1.41, 1.73, 2.0] map odd [1..5] à [True, False, True, False, True] map length ["beta","delta","U2"] à [4, 5, 2]

43

The third example above shows that the qualified expression need not depend on the variables in the generators. The qualified expression in the fifth example itself contains a generator which depends on n.

6.4.1 Multiple generators The examples above can probably be very loosely compared to simple (unnested) loops in conventional languages. Now we examine having more than one generator in a comprehension — loosely corresponding to having nested loops in your conventional language.

Generators introduce new local identifiers which come into scope from left to right (i.e. a generator clause can depend on other generators to the left of it, but not vice-versa).

The first two examples show typical cross-product operations which return lists of tuples.

6.5 Discussion and Examples The first example uses the definition of the family from the example in Section 5.3,

[(x,y) | x <- [9,8,7], y <- [1,2]] à [(9,1),(9,2),(8,1),(8,2),(7,1),(7,2)] [(x,y) | x <- [1..10], odd x , y <- [1..x], even y, x+y > 9] à [(7,4), (7,6), (9,2), (9,4), (9,6), (9,8)] [x^y | x <- [2,3,4], y <- [1..x]] à [2, 4, 3, 9, 27, 4, 16, 64, 256] [ (i,j) | i <- [1..3], let k = i*i, j <- [1..k] ]

List comprehensions, like the map function, process each list candidate element independently from every other. So these are not suitable for trying to express functions which depend on the position or ordering of elements in a list, or on some combination of the elements. Summing a list, or testing if a list contains adjacent duplicate elements, or testing if list elements are in ascending order can not be elegantly programmed with these notations.

f xs = [ 2*x^2 + 3*x + 7 | x <- reverse xs, even x] f [1,2,3,6,5,4] à [51,97,21] [ 2*x | x <- [1..5], odd x] à [2, 6, 10] [ "hi" | x <- [1..5] ] à ["hi", "hi", "hi", "hi", "hi"] [ (x,x*x)| x <- [1,7,2,4] ] à [(1,1), (7,49), (2,4), (4,16)] [ [1..n] | n <- [1..4] ] à [[1], [1,2], [1,2,3], [1,2,3,4]]

44

and uses pattern matching to extract the names of all family members.

Don’t make the mistake of using two generators in a comprehension to attempt to process corresponding elements of two lists. For example, given lists xs and ys, one cannot use a "nested" generator to add the first element of the xs to the first of the ys, the second of the xs to the second of the ys, etc. You would do better to zip the lists first, then use a single comprehension to process them “pair-at-a-time”.

This example also shows a feature: in any generator p <- g, it is permissible for g to be any Haskell expression that yields a list, and p can even be a pattern rather than a simple variable. Because pattern matching can fail or succeed, this is used as an extra kind of filter: if the pattern match fails for some elements generated by g, these elements are simply ignored.

The next example shows a wildcard in the pattern. The generator loops over each row of a matrix (represented here as a list of lists) in turn, and uses the pattern to extract the head of that row. So the function returns a list of the heads of every row, i.e. the first column of a matrix.

The next example is Quicksort. Recall that Quicksort sorts a list by selecting an arbitrary item from the list, and using that as a pivot. It then partitions the remaining items so that all items less than the pivot are lumped together on one side, say the left, and all items bigger than (or equal to) the pivot go to the other side. By recursively sorting the two halves, and then appending together the sorted left half, the pivot, and the sorted right half, the list is fully sorted. Extracting all items that are less than some pivot is a filtering problem which is elegantly formulated as a list comprehension: [ e | e <- x, e < pivot ] The algorithm follows directly:

type Matrix a = [[a]] firstCol:: Matrix a -> [a] firstCol m = [ e | (e:_) <- m ]

sum2 xs ys = [ x+y | (x,y) <- zip xs ys ] sum2 [3,4,5] [16,10,104] à [19, 14, 109]

familyNames:: [String] familyNames = [ name | (name, _, _) <- family ]

45

Probably the most interesting aspect of this example is that this program specifies the Quicksort much more lucidly, more precisely, and more concisely than the English description we used to introduce the example! In addition, it is directly executable and testable.

The efficiency of the Quicksort algorithm can be marginally improved by making only one pass over the list in order to do the partitioning step. This sacrifices some clarity, but can be formulated using accumulating parameters as shown.

The next example finds all possible permutations of a list of elements. The recursive solution essentially draws each element from the list, and for each of these cases attaches the element to every permutation of the remaining list. The bag difference function \\ (only available if you import the module List) returns its first argument after removing some items: each element in the second argument causes the deletion of the first matching item in the first argument, if it exists. The expression xs \\ [e] therefore removes one occurrence of e from xs.

qs:: Ord a => [a] -> [a] qs [] = [] qs (p:xs) = let (smalls,bigs) = partition xs [] [] partition [] ss bs = (ss,bs) partition (x:xs) ss bs | x < p = partition xs (x:ss) bs | otherwise = partition xs ss (x:bs) in qs smalls ++ [p] ++ qs bigs

qs:: Ord a => [a] -> [a] qs [] = [] qs (p:xs) = let smalls = [ e | e <- xs, e < p] bigs = [ e | e <- xs, e >= p] in qs smalls ++ [p] ++ qs bigs qs ["beta","delta","omega","alpha"]

à ["alpha","beta","delta","omega"]

46

Although the above is an elegant formulation of perms, the bag difference operator requires that the elements are in class Eq. The following version of perms gets around this restriction, and is considerably more efficient (except perhaps while trying to understand it)!

6.6 Exercises 1. The perms function defined above will return multiple occurrences of the

same permutation of objects if duplicates appear in the original list. Write a function to remove all duplicates, so that uniquePerms [1,2,2] à [[1,2,2], [2,1,2], [2,2,1]]

2. Find all Pythagorean triples (a,b,c), a ≤ b < c < 30, whose integer sides represent the lengths of right-angled triangles. Some examples are (3,4,5) and (5,12,13).

3. Goldbach’s conjecture is that every even number greater than 2 can be expressed as the sum of two primes. For each candidate number (up to a limit of 50) find all pairs of primes which sum to that number.

4. Some even numbers greater than 2 have unique decompositions into the sum of two primes. Others have multiple solutions, e.g. 124 can be expressed as 11+113, or 17+107, or 23+101, etc. Modify the previous program to only find those numbers with unique decompositions.

5. The exercises in Section 4.4 which dealt with with factors and perfect numbers are much easier to write using list comprehensions. Rewrite them in this style.

6. Write your own implementation of the map function that does not use explicit recursion.

perms:: [a] -> [[a]] perms [] = [[]] perms (x:xs) = [zs | ys <- perms xs, zs <- interleave x ys] where interleave x [] = [[x]] interleave x ys@(y:z) = (x:ys):[y:v | v <- interleave x z]

import List perms:: Eq a => [a] -> [[a]] perms [] = [[]] perms xs = [e:p | e <- xs, let ws = delete e xs, p <- perms ws] perms [] à [[]] perms [1] à [[1]] perms [1..3] à [[1,2,3], [1,3,2], [2,1,3], ... ]

47

7. Is this always true for any predicate p, and for any possibe lists xs and ys? filter p (xs++ys) = (filter p xs) ++ (filter p ys) (Later in these notes we’ll tackle some formal methods for proving identities, and we’ll get a definite answer to this question.)

8. Find a specific case that shows that the “law” below does not always work. In general, under what conditions will this fail to hold? filter p (filter q xs) = filter q (filter p xs)

9. Under what circumstances could the two generators x<-g1 and y<-g2 be interchanged (without a change in meaning, of course) in the expression sort [ qexpr | x <- g1, y <- g2]

10. Assume p x is a predicate which is only true for 10% of the items in g1, while q x is a predicate which is true for 90% of the items in g2, and that the generators are independent of each other. Assume also that both predicates are total, that is, defined for all possible values in their domains. Which of the following expressions is likely to be the most efficient, and by how much?

• [qexpr | x <- g1, y <- g2, p1 x && p2 y] • [qexpr | x <- g1, p1 x , y <- g2, p2 y] • [qexpr | y <- g2, p2 y , x <- g1, p1 x]

6.7 Key Points • Manipulating lists as a whole rather than item at a time can be useful.

• There is an easy notation to generate lists.

• A filter selects some items from a list.

• List comprehensions combine generators and filters into a very elegant and powerful programming construct.

6.8 Built-in functions you should know Get to know these functions from the Haskell standard library: map, filter.

48

7 Data Types

7.1 Data Declarations

7.1.1 Simple multi-field records Haskell permits the introduction of new data types via data declarations.

For example, a student might be represented by a string (for their name), and an integer (for their age):

The new type name is StudentT. The constructor for values of this type is called Student. The constructor acts as a kind of wrapper which packs the two fields together into a single value, and also uniquely identifies the type of the new record. The Haskell expression (Student "Peter" 21) will create a new data structure of type StudentT.

Note in the data declarations that only the types of the fields are given — we do not say that the string is intended for the student’s name, or the integer is their age. The data declaration describes the structure rather than attempting to describe the meaning of the data items. Unlike most other languages, there are no “field names” either — all we are declaring here is the types of the fields that make up the new type.

Constructors are also used used as patterns on the left side of a function leg to “unpack” the record into its fields, and to give them names. Note the use of the type name in the type signatures.

(Recall that all type names must start with a capital letter. We’ll also use the convention that our user-defined types have a capital letter T at the end. This is not

-- now use our new type... getName:: StudentT -> String getName (Student nm age) = nm getOlderStudent:: StudentT -> StudentT -> StudentT getOlderStudent a b = if getAge a > getAge b then a else b where getAge (Student _ x) = x s1 = (Student "Joe" 20) s2 = (Student "Mary" 19) getName s1 à "Joe" getOlderStudent s1 s2 à (Student "Joe" 20)

data StudentT = Student String Int

49

widespread practice in Haskell, but it helps beginners to easily distinguish between the type names (which are only used in data declarations and type signatures) and the constructor names, which are the“wrappers”, and are used in the patterns and the bodies of the function code.)

7.1.2 Towards Abstract Data Types In the first example above, we had two fields that we wanted to group together as a single type of entity, so the reason for using a data declaration was quite obvious. But we often use a wrapper around single fields too, because it allows us to create an extra level of abstraction and type safety. For example, a set of integers is a collection of integers without any duplicates. We’d like to define a new data type called an IntSetT, and then write specific set-manipulation functions like union and intersection. Along the way, we need to choose some representation, or way of storing my sets, in Haskell. We might choose to store them as a list of integers, kept in sorted order, or perhaps as a tree, or perhaps in some other kind of representation. This new type becomes an Abstract Data Type (ADT), one that has particular properties and functions that can operate on values of the type, but one whose internal details can remain hidden.

The advantages of the type safety now kick in. We cannot simply pass any list of integers to one of these functions. They are not defined to work with lists of integers. These set-manipulation functions can only operate on lists that have been properly “wrapped”. Haskell’s multi-module system provides a way to export the ADT type name and the function signatures from one module, but still keep the constructors “private”. (Multi-module programs are unfortunately beyond the scope of these notes).

However, even within a single module the ADT notion is very useful in organizing our types and obtaining a better level of conceptual separation between the user and the implementor of the new type.

7.1.3 Polymorphic data types We should not be satisfied with a language that forces us to provide different set packages for different types of elements, so the next requirement is to generalize our data declarations by adding type variables, much like we previously did for polymorphic functions. If we’re going to store our sets as ordered lists, though, we need to ensure that the allowable types are orderable.

data IntSetT = IntSet [Int] union:: IntSetT -> IntSetT -> IntSetT intersection:: IntSetT -> IntSetT -> IntSetT ...

50

The (Ord a) restriction on the data declaration means that we can not create a set such as (Set [sin,cos,tan]). But even though that restriction is in place, we are still required to re-state the Ord context restriction for each of the functions that processes the sets.

(Aside: The sharp reader may realize that a set package inherently needs equality to be defined, but here we have asked for a stronger requirement - orderability. This is because we’ve allowed some of our “representation choice” (i.e. to keep the structures ordered in some way for ease of processing) to “leak” out of the Abstract Data Type.)

7.1.4 Multi-case types A geometric shape could be a circle which is described by giving only its radius, or a triangle, which may be described by the lengths of all three sides. We can create a new type called ShapeT with alternative constructors:

This definition says that a ShapeT data item is either a Circle, in which case its value is one Double quantity (interpreted as a radius), or it is a Triangle, in which case our representation consists of three Double values.

The constuctors in a data declaration don’t even have to have any fields associated with them. This even simpler case is declares a new type called Colour. Colours can be represented in programs in many ways: one way is to have small number of preset enumerated names. Another representation is to use RGB values. A common representation for greyscale values is to store only a single integer intensity. The Hue, Saturation and Luminance (HSV) representation is also commonly found. Here is a declaration which defines a colour type that can cater for colours values in any of a mixture of these formats.

In an earlier chapter we emphasized the advantages of having the form of the data match the form of the program that processes it. Each of the cases in the data declaration can map onto a single leg of a function. By using the constructors and pattern matching we get a very elegant correspondence. This function converts any

data ColourT = Red | Green | Blue | Yellow | RGB Int Int Int | HSV Int Int Int | GreyScale Int

data ShapeT = Circle Double | Triangle Double Double Double

data (Ord a) => SetT a = Set [a] union:: (Ord a) => SetT a -> SetT a -> SetT a intersection:: (Ord a) => SetT a -> SetT a -> SetT a

51

one of the representations into an “equivalent” RGB representation:

The case-by-case capabilities of the data declaration mechanism in Haskell, together with the leg-by-leg function definitions based on pattern matchine might be one of its most important contributions to programming technology. Some authors[31] argue that the case-by-case organization of the data and the corresponding case-by-case coding of the functions lead to easy-to-design, easy-to-write, and easy-to-maintain programs.

7.1.5 Recursive data types - Binary Trees To describe data structures that do not have a fixed size we use recursive structures — data that is defined in terms of itself. As an example, a binary tree can be empty, or it can be a node containing a data item and two subtrees. The recursive nature of the structure arises because the subtrees are also binary trees. The Haskell data declaration closely reflects our notion of a tree of any type of items:

The right-hand side of a data declaration consists of one or more data constructors. Each constructor can have fields. The arity of a constructor is the number of component fields that it has. Constructors with zero arity are sometimes simply called data constants. In this example the constructors are Empty and Node, with arities 0 and 3 respectively.

Data structures are built by applying the constructors to arguments. Examples involving the list constructor : are common in the previous chapters. Here the expression (Node Empty 123 (Node Empty 456 Empty)) creates a tree which is a node with an Empty left subtree, a value 123, and a right subtree which has two Empty subtrees and a root value 456.

Some operations on trees don’t depend on the kind of item in the tree. For example, we could count the nodes or find the height without needing to work with any of the data in the tree. But other functions may only be applicable to trees with certain types of items. For example, the next function illustrates processing a tree of numbers by summing all the items in the tree. This function can only operate on trees of some kind of number. To make the function polymorphic yet restricted to numbers, we once again provide a context which limits what types can be substituted for the

data TreeT a = Empty | Node (TreeT a) a (TreeT a)

toRGB:: ColourT -> ColourT toRGB Red = (RGB 255 0 0) toRGB Green = (RGB 0 255 0) toRGB Blue = (RGB 0 0 255) toRGB Yellow = (RGB 255 255 0) toRGB x@(RGB _ _ _) = x toRGB (HSV h s v) = ... -- How do we map -- HSV to RGB? toRGB (GreyScale v) = (RGB v v v)

52

type variable.

The idea here is that a TreeT can be one of two alternatives. When sumTree is called and passed a tree argument, the particular form of the argument will determine which of the two legs of the function is selected. There is a very close correspondence between the data declaration which defines the two possible forms of a tree, and the two-legged function definition which implements the case-by-case analysis and processing. Notice again that it is up to the function definition to choose “field names” for the three parts of a non-empty node in the tree. We stress again — the names are not built into the data declaration, and some other function that processes the same data could decide to call the fields x,u,v instead of l,n,r.

We now introduce some more terminology. The arity of the type (not to be confused with the arity of the constructors) is the number of alternatives in the type.

In the examples above, ColourT is a type (without type parameters), with arity 7, while four of its data constructors have arity zero, two have arity three, and one has arity one. TreeT is a parameterized type with arity 2: Empty is a data constant (a data constructor with arity 0), and Node is a data constructor of arity three.

7.1.6 Visiting all the nodes in a binary tree A useful example related to searching trees is that of visiting all the nodes of a tree. Here we build those nodes into a list, thereby flattening the tree, i.e. producing a list of all the data values in the tree in some predefined traversal order. It is often easy to start by constructing the type signature. Once we spot that flatten must return a list, (and therefore each of its legs must return a list), it helps to formulate the (slightly tricky) cases of what to do for the empty tree.

Simple modifications can be made for pre-order or post-order traversals.

flatten:: TreeT a -> [a] flatten Empty = [] flatten (Node l n r) = flatten l ++ [n] ++ flatten r flatten (Node Empty 123 (Node Empty 456 Empty)) à [123, 456]

sumTree:: (Num a) => TreeT a -> a sumTree Empty = 0 sumTree (Node l n r) = (sumTree l) + n + (sumTree r) sumTree (Node Empty 123 (Node Empty 456 Empty)) à 579

53

7.2 Huffman Encoding We now turn to a more substantial example: a well-known data compression technique.

Messages are made up of sequences of characters or symbols which must be represented somehow.

In fixed-length encoding schemes (like ASCII), each character consumes the same number of bits. In ASCII, characters are 8 bits each. But if you only had a few characters or distinct symbols that you wanted to encode, say 16 symbols, you could invent an alternative encoding in which each symbol needed only 4 bits. With k bits per symbol, you would get 2k bit-patterns, or 2k possible symbols that could be represented.

Huffman encoding doesn’t use fixed-length encoding. It compresses the message into a sequence of bits, where the symbols each have their own sequence of bits to represent them. But Huffman’s algorithm organizes things so that the most frequently occuring symbols have short bit representations, while the infrequently used symbols use more bits to represent.

We start with the set of n possible symbols x1, x2, ... xn and their probabilities of occurrence in the messages, denoted by p(x1), p(x2), ... p(xn). These probabilities are assumed to be fixed, and known beforehand. (In English, for example, we know that the symbol ‘e’ occurs much more often than ‘q’). We must find an encoding which will compress the number of bits in of the messages so the average length of random messages will be minimized. (When we say “random messages” we mean random, but adhering to our probability distribution. So, a random chunk of English text, for example.)

The essential idea behind Huffman is to break away from the fixed-length encoding in which every character has to be stored in the same number of bits. If we can invent a system in which the most frequently occurring characters have very short bit patterns, (even if that means that the infrequently used characters have longer patterns), then the average length of the encoded message will be minimized.

This algorithm is found in data communications and compaction algorithms, as it is often used as a baseline for comparing other algorithms, because it can easily be proved that it produces best possible lossless compression if symbols are all encoded independently of one another. (Lossless means we can recover the original message exactly.) Actually, one can do lossless compression even better than Huffman encoding. By considering more than one symbol at a time and encoding “runs” of symbols, (which is what the popular zip compression does on our computers), one can achieve better performance. See Witten et. al. [75] for details.

The Huffman algorithm is a two phase algorithm. Our problem statement is “Given the symbols and their probabilities, determine an efficient encoding.”

The first phase builds a strictly binary tree. We begin by treating each character or symbol as a leaf node, which, of course, is trivially a tree. This is the starting collection of trees. (Initally, every tree is just a simple leaf node.) Now we can find

54

any two trees with the smallest probability of occurring, and combine these two as the children of a new parent node. The new tree’s probability is the sum of its children’s probabilities. The children are deleted from the “live” collection of trees, and the new tree (with parent as the root and the two children as its subtrees) is inserted into the collection, to be used as a candidate for further combinations with other nodes. This iteration process – combine the two least-weight subtrees into a new tree – is performed repeatedly. Since each step removes two child trees and inserts the parent tree in their place, the algorithm will combine all the nodes into a single binary tree in n-1 steps.

Recall that the path length of any node x, denoted by l(x), is its distance from the root. The binary tree built by the Huffman algorithm has an interesting property: for all symbols x and y, l(x)>=l(y) iff p(x)<=p(y). In other words, leaf nodes with high probabilities are never further from the root than those with lower probabilities. The most frequently occurring symbols are close to the root of the tree.

Phase two of the algorithm assigns the bit patterns to the original symbols, based on their positions in the tree. The bit pattern assigned to a symbol is just the sequence of edges from the root to that symbol, where each left descendant edge is labelled 1, and each right descendant edge is labelled 0 (or vice-versa). The number of bits representing a symbol will therefore be the path length of that symbol.

55

Now that we understand Huffman’s encoding algorithm, let’s turn it into Haskell.

Because the tree is strictly binary (a node cannot have one empty subtree), and because we can assume that the tree will never be empty, we can define a tree with explicit Leaf nodes, which is a bit different from the earlier examples in this chapter. This leads to this data representation:

Because we know exactly what we are going to store in this tree (a Char at each node), we didn’t create a more general polymorphic tree using type variables. This will keep it simpler for us. Modern software development ideas like eXtreme Programming suggest that one does better by not generalizing when it is not needed!

Each iteration combines the two trees with the smallest weights into a new tree. Finding the two smallest elements will be easy if we keep the list of partially built subtrees in ascending order of weight — the smallest weight subtrees will be at the front of the list. Of course, when we’ve combined two children, the resulting subtree needs to be put back in the list to replace the two subtrees that we delete, and at this point we’ll have to insert it at the correct position to keep the list ordered.

We are now ready to code up the top level: the huffman function must build the tree, then find the paths. Initially, we’ll start with a list of probabilities and character symbols. An initial conversion of the input data turns each symbol into a tree which is a Leaf, and inserts each, with its weight, into a list so that the list remains sorted on the probabilities.

data HTreeT = Leaf Char | Node HTreeT HTreeT

56

This hinsert function will also be useful later as we combine nodes and have to put the resulting trees back. (We’ve omitted its type signature)

Once initialSort has produced a sorted list of leaf trees to begin with, the tree can be built as follows: if there is only one tree remaining in the list, the task is complete, else combine the two smallest (front of list) elements, and insert the result in the appropriate place in the rest.

At the base case, buildTree returns the completed tree.

Next we walk the tree, and return a list of each leaf node, and where in the tree we found it. As we descend the tree we pass down an accumulating parameter which keeps track of the path from the root. When we find a leaf node we create a pair of its character and its path. Note the similarity to flatten discussed in Section 7.1.6.

-- this is still under the scope of the where above... buildTree [(p,t)] = t buildTree (a:b:xs) = buildTree (hinsert (combine a b) xs) where combine (p1,t1) (p2,t2) = (p1+p2, (Node t1 t2))

huffman:: [(Double,Char)] -> [(Char,String)] huffman xs = getPaths (buildTree (initialSort xs)) where initialSort [] = [] initialSort ((p,s):xs)= hinsert (p,(Leaf s)) (initialSort xs) hinsert a [] = [a] hinsert a (b:x) | fst a <= fst b = a:b:x | otherwise = b:hinsert a x

57

That’s all there is to it!

7.3 Exercises 1. A set can be represented as a list which contains no duplicates. Create a new

data type SetT, and write a package of typical set operations to insert and delete elements, perform union, intersection, membership tests, cardinality, set equivalence, set difference, and subset tests. Also provide a means to convert a list with duplicates into a set. By keeping the set as an ordered list, the cost functions for each operation can be made linear.

2. A polynomial of one variable can be represented as a list of its coefficients. We choose to keep the coefficients in order of increasing powers of x. For example, the polynomial

3.1x4+4.2x3+9.3x+7.4 is represented by the PolyT value Poly [7.4, 9.3, 0.0, 4.2, 3.1]

a. Write a function to add two polynomials together. padd:: PolyT -> PolyT -> PolyT

b. Write a function to evaluate a polynomial for an arbitrary x. peval:: PolyT -> Double -> Double

c. Write a function to multiply two polynomials together: ptimes:: PolyT -> PolyT -> PolyT

3. Write a function which finds the height of a binary tree.

4. Write a function which counts the number of internal nodes in a binary tree.

5. Write a function which doubles every element in a binary tree of numbers.

getPaths t = gp t "" where gp (Leaf s) path = [(s, path)] gp (Node l r) path = gp l (path ++ "1") ++ gp r (path ++ "0") -- some test cases and output now... huffman [(0.6,'A'), (0.3,'B'), (0.01,'C'), (0.03,'D'), (0.02,'E'), (0.015,'F'), (0.015,'G'), (0.01,'H')] à [('C',"11111"), ('H',"11110"), ('E',"1110"), ('F',"11011"), ('G',"11010"), ('D',"1100"), ('B',"10"), ('A',"0")] huffman [(0.5,'A'),(0.2,'B'),(0.1,'C'),(0.1,'D'),(0.05,'E'),(0.05,'F')] à [('B',"11"), ('D',"101"), ('E',"10011"), ('F',"10010"), ('C',"1000"), ('A',"0")]

58

6. Write a function which does one pass over a non-empty strict binary tree and and returns a pair containing the minimum and the maximum elements in that tree.

7. Write a function which searches for an element in a binary search tree. It should return True if the number is found, otherwise False. A binary search tree is one in which all values in the left subtree are less or equal to the root, and all items in the right subtree are larger than the root. This property holds for all subtrees too.

8. Write a function which builds a list of elements into a binary search tree.

9. Decode a Huffman bit stream, turning it back into a list of characters. One approach is to keep the original tree built by the Huffman encoding, then use the incoming bit stream to switch left or right until a leaf is encountered, in which case you output the symbol and begin again at the top.

10. Make some measurements of the frequencies of character occurrences in a text file on your computer, and encode these using Huffman’s algorithm. Compute the expected average character bit-length, (this is the sum of their bit-strings weighted by their probabilities), and compare this to the unencoded bit-length of the characters (usually one 8-bit byte per character). What percentage space saving could you expect if text files were compressed with the Huffman algorithm? How does this compare to any other compression programs that you have available?

11.

a. An AVL tree (or height-balanced tree) is a binary tree which is not necessarily perfectly in balance, but is “close” to balanced. More precisely, a binary tree is an AVL tree if, for every node, the height of the left and right subtrees differ by no more than one. Choose a representation for a binary tree, and write a function to determine whether a given tree is AVL balanced.

b. The “simple” solution to this problem - traversing each sub-tree twice, once to find its height and once to ask if it is balanced, has very poor performance. Draw a full binary tree of 5 levels of nodes, and work through your code. Determine how many times each leaf node is visited. Rewrite your function so that it only performs one pass of each subtree, and returns both the height and whether the tree is balanced. Test the program and demonstrate or comment on the performance.

12. A decoration called a mobile consists either of a leaf decoration, or of a beam suspended from a single thread which is attached to its center point. Each end of the beam can in turn support other mobiles. Every beam in the mobile should be balanced. Devise a representation for a mobile, and write a function which determines whether a given structure is in balance.

59

(For the enthusiastic, allow the thread to be attached at some point other than the center.)

13. A BB tree (or bounded-balanced tree, weight-balanced tree) is a binary tree in which the fraction of nodes in the left subtree (over those in the whole tree) is kept between some pre-determined limits. These limits are denoted by a value β which must lie between 0 and 0.5. The tree is BB balanced at a particular value of β iff, for every node in the tree, ( ) ββ −≤

++

≤ 11)(

1)(treenumNodes

eleftSubtrenumNodes

For example, when β=0.3, each subtree must have between 30% and 70% of its nodes in the left subtree. As β approaches 0.5, the balancing requirements become more strict.

a. Write a function to find the maximum value for β for which the tree is BB balanced. (i.e. find the fraction value for the most imbalanced node in the tree.)

b. Write a function to determine whether a tree is BB balanced for some value of β.

14. (Reasonable size mini-project) Write a package to insert, delete, and search for items in a binary search tree. The tree must be kept in approximate balance, either AVL balanced or BB balanced.

7.4 Key Points • A key aspect of writing good programs is to understand the data structures,

and to specify them precisely.

• data declarations introduce new data structures in Haskell.

• Haskell uses a single, powerful data declaration mechanism.

• Each alternative form (variant) of the data structure has its own constructor tag, and can have zero or more fields.

• Data structures are made polymorphic by parameterizing their data declarations with type variables.

• Data structures can be recursive.

• The constructors are used in patterns in the function definitions, where they select the appropriate leg, and name the fields. This leads to definitions with one or more patterned legs for each variant of the data, so that the form of the program matches the form of the data.

60

8 Lambda Expressions

8.1 Lambda Expressions Why is λ used on the cover of these notes? What's the big deal with lambda?

The original theory behind functional languages was developed in the 1930's, long before we had computers. The formalism used there was called the Lambda Calculus, and it used the lambda symbol to introduce an "anonymous" function definition.

All the functions we're worked with so far have been given explicit names. What we've seen is that we can use functions like other kinds of data: in lists, or passing a function into another function like filter.

A lambda expression is an anonymous function definition: it doesn't need a name, and can just be defined right at the point you need to use it. It has the form

Unfortunately, keyboards don't have a λ key, so the Haskell designers used the closest thing they could find instead – the backslash. Like all Haskell expressions, lambda expressions have a specific type that you can infer.

Here are some pairs of definitions of functions: in each pair the two definitions are completely equivalent in Haskell:

The advantage of anonymous functions is that they can greatly reduce the amount of "plumbing" that you might otherwise need. Lambda expressions let you keep the logic all in one place, as opposed to having to define a new function somewhere else in the program, think up a name, clutter your namespace, and put extra cognitive load on the person reading the program. (It is more difficult to read things if you have to keep jumping off elsewhere to find what is being referred to.)

One drawback of anonymous functions is that, because they don't have a name, you can't (easily) call them recursively! (There are some tricks, and some languages even introduced special keywords to allow an anonymous function to make self-calls.)

inc1 x = x + 1 inc2 = \ x -> x+1 add1 x y = x + y add2 = \ x y -> x+y -- pass an anonymous function to filter filter (\ n -> n % 2 == 1) [1..5] à [1,3,5]

\ {function parameters} -> {function body}

filter odd [1,2,3,4,5] à [1,3,5]

61

8.2 Towards a Query Language A common real-world problem is to manipulate collections of data. Typically you need to compose queries that usually involve selecting some subset of the records, grouping them on some property, aggregating (counting or summing) elements in each group, and then sorting and presenting the results.

Domain Specific Languages (like SQL) have a special syntax to make this sort of thing easy (but SQL only works with databases), whereas some newer ideas like the C# LINQ query let you build and execute queries on very general data sources – databases, xml files, spreadsheets, lists of records, etc. In C#, for example, the queries make extensive use of lambda expressions to control how the data is sorted, grouped, filtered and aggregated. We'll use that kind of style in our example here.

Our problem is to produce a count of how many students are registered for each degree, since some starting date. We'll begin with some definitions by just using a small list of students as our data source:

One not-previously-seen Haskell feature above is the use of field names in our data declaration. Field names make the data declaration look a lot more like classic declarations in most other languages. In Haskell, they can be used as accessor functions to extract individual fields from a record. They can also be used in a few other ways, e.g. to explicitly name the arguments given to the constructor.

We'll now show how to select all students registered since 2009, group them by the degree they're taking, aggregate the data by producing counts of each group, and then sort the data. We'll do it here in a couple of small steps, but quite often you'll see logic like this all written as a single, fat query expression.

import List data Degree = BCom | BSc | BA deriving (Eq, Ord) data Student = S {year::Int, name::String, deg::Degree} testData :: [Student] testData = [ S 2008 "L Lohan" BA, S 2009 "A Jolie" BCom, S 2008 "P Cruz" BCom, S 2009 "M Damon" BCom, S 2010 "G Bridges" BCom, S 2009 "J Travolta" BSc, S 2008 "P Hilton" BSc, S 2009 "C Diaz" BSc, S 2010 "B Stiller" BCom, S 2009 "C Grant" BA, S 2009 "D Quaid" BA, S 2009 "M Ryan" BA ]

62

First we filter the source to select only those students of interest:

There is a function in the List module that can groupBy, but before we call it we'll need to have the students sorted so that all the students taking the same degree are alongside one another in the list. (Their groupBy is not very clever!)

To group by degrees, we sort then group: It returns a list of lists of students.

From the list of lists, we now aggregate to produce a corresponding list of pairs. Each pair has the degree and its popularity count.

Finally, sort this into descending order of popularity of the degrees:

Put it all together and run it:

Later in these notes we will show some even more convenient notation for creating pipelines of functions that feed their outputs into the next stage.

8.3 Reflection For many years Functional Programming has been an academic and theoretic novelty, not often found in the real world. But recently there has been a surge in adopting functional features into mainstream languages. C#, Python, Ruby, J#, Scala and many others are embracing functional principles and patterns. You'll be talking about lambda expressions in your workplace sooner than you might think!

8.4 Built-in functions you should know filter, sort, sortBy, map, groupBy, compare.

query1 ss = sortByCounts (countByGroups (groupByDegree (since2009 ss))) query1 testData à [(BCom,4),(BA,3),(BSc,2)]

sortByCounts pairs = sortBy (\ (_,c1) (_,c2) -> compare c2 c1) pairs

countByGroups groups = map (\ xs -> (deg (head xs), length xs)) groups

groupByDegree ss = groupBy (\ x y -> deg x == deg y) (sortByDegree ss)

sortByDegree ss = sortBy (\ x y -> compare (deg x) (deg y)) ss

since2009 ss = filter (\ s -> year s >= 2009) ss

63

9 Program Transformation The property of referential transparency is a mathematical notion that we are all familiar with. It allows us to substitute any expression for an equivalent one, without changing the meaning, or value, of the expression. For example, we can usually substitute x + x by the expression 2 * x. (This property is not always true in conventional languages — further discussion is in Section 16.2).

This chapter develops the basic framework for reasoning about Haskell programs. Initially the programmer uses these tools to help understand, write, verify and transform the code, but ultimately we hope to see automatic transformation systems that use this kind of framework for developing and optimizing programs. Some encouraging progress has already been made in this direction[23, 50], and we expect more widespread use of techniques like these as Computer Science matures.

9.1 Some Laws of Haskell High-school algebra has introduced us to the notion that algebraic expressions can be manipulated via a few simple laws of substitution. A Haskell program is also an expression, and can also be manipulated. This section presents some Haskell identities. (An identity is a law stating that one expression is equivalent to, or can be substituted for, another.) Some of the identities are axiomatic (i.e. they are true because of the way we defined the objects and operators), while others can be derived from the axioms, or are reasonably obvious.

head (x:xs) = x tail (x:xs) = xs []++xs = xs ([] is a left identity of ++) xs++[] = xs ([] is also a right identity of ++) xs++(ys++zs) = (xs++ys)++zs (++ is associative.) [x]++xs = x:xs (A left singleton to ++ reduces to a cons.) (x:xs)++ys = x:(xs++ys) length (xs++ys) = length xs + length ys reverse (xs++ys) = reverse ys ++ reverse xs reverse (reverse xs) = xs (for finite xs) (take n xs)++(drop n xs) = xs, if n >= 0

As our functions get more complex the laws may not be as obvious, so we need a formal means of proving equivalence of expressions.

9.2 Structural Induction Just as mathematics often uses induction to prove properties about the whole or

64

natural numbers, we often use a modified form of induction which considers the structure of an object. Recall that we begin with an hypothesis, H(x), (the thing we are trying to prove) and that induction over the whole numbers consists of a base step and an inductive step:

Base case — H(0): Show that the hypothesis H is true for the base case, usually 0.

Inductive case — H(n)⇒H(n+1) : Show that whenever H(n) holds (the hypothesis is assumed true for n), then this implies that H(n+1) must also hold.

The second case is called the induction step.

Once these two steps are proven we are sure that the hypothesis holds in all cases (i.e. it is no longer a hypothesis, it is now a proven fact, or theorem). This is so because the claim can be proved for any natural number by starting from the base case, and using the H(n+1) implication/induction step repeatedly.

How do we modify induction for our needs? Our basic structure in Haskell is the list. To prove that some property holds for all elements of a list, we use structural induction. This induction is valid, because all lists can be generated by starting from the base case and using the constructor operation repeatedly. The steps needed for a proof by structural induction are:

Base case — H([]): Show that the hypothesis holds for the base case.

Inductive case — H(xs)⇒H(x:xs): Show that whenever H(xs) holds for an arbitrary list xs, then this implies that H(x:xs) must also hold.

The steps required for an inductive proof are strongly related to the steps needed to write a recursive function. We construct recursive functions by handling the base case, and then by programming case n+1 in terms of case n (for functions that operate on integers), or by programming case (x:xs) in terms of case xs (for functions which use lists). The existence of a recursive function therefore suggests the formulation of an inductive proof, and vice-versa.

Example: One of the “intuitive” laws given above states that ++ is associative. We prove this formally. The legs of the definition are labelled for reference:

Hypothesis: xs++(ys++zs) = (xs++ys)++zs. We wish to show that this holds for any lists xs, ys, and zs. Now the induction usually operates on only one of the variables, so the next step is to choose which one of these should be the induction variable. The function ++ performs its job by taking its left argument apart: the relationship between writing recursive functions and inductive proofs suggests that any proof about properties of ++ will work the best if we use induction on xs. Thus the only we choose to “parameterize” the hypothesis only on xs, so we begin with the hypothesis.

H(xs): s++(ys++zs) = (xs++ys)++zs.

[] ++ ys = ys (++.1) (x:xs) ++ ys = x:(xs ++ ys) (++.2)

65

Required to Prove: that the hypothesis always holds.

Start with the base case, does H hold when xs = []?

xs++(ys++zs) is LHS (left hand side) of H([]) = []++(ys++zs) (xs = []) = (ys++zs) (++.1, forward) = ys++zs (remove parentheses) (xs++ys)++zs (RHS (right hand side) of H([]) = ([]++ys)++zs (xs = []) = (ys)++zs (++.1, forward) = ys++zs remove parentheses

So the base case works and is proved. Now what about the inductive case?

Does H(xs) imply H(x:xs)? We assume that H(xs) holds and try to prove H(x:xs), that is, that (x:xs)++(ys++zs) = ((x:xs)++ys)++zs.

(x:xs)++(ys++zs) LHS of H(x:xs) = x:(xs++(ys++zs)) ++.2, forward = x:((xs++ys)++zs) by H(xs) assumption, = x:((xs++ys))++zs ++.2, backward = ((x:xs)++ys)++zs} ++.2, backward. This is the RHS of H(x:xs). So the inductive case has also been proved, and this completes the proof.

Another Example Proof Given this definition:

leng [] = 0 leng (_:xs) = 1 + leng xs

prove the hypothesis leng(xs++ys) = (leng xs) + (leng ys)

We use induction on xs while permitting ys to range over any value. So

H(xs) : leng(xs++ys) = (leng xs) + (leng ys)

Required to Prove: that the hypothesis always holds.

Start with the base case, does H([]) hold?

leng ([]++ys) LHS of H([]) = leng ys ++.1 = 0 + leng ys identity element of addition = leng [] + leng ys leng.1, backwards, is RHS of H([])

Inductive case: Does H(xs) imply H(x:xs)?

Assume H(xs), then show that

66

leng ((x:xs)++ys) = leng(x:xs) + leng ys leng ((x:xs) ++ ys) LHS of H(x:xs) = leng (x:(xs ++ ys)) (++.2) = 1 + leng (xs ++ ys) (leng.2) = 1 + leng xs + leng ys (by hypothesis H(xs) = leng (x:xs) + leng ys (leng.2 back)

So here we're completed the proof: we've shown that the left-hand side can be rewritten repeatedly until we get to the right-hand side.

9.3 Equivalence of Functions Two functions are equivalent if they produce the same results for all possible inputs. In the previous chapter we saw two functions for reversing a list: one used backward recursion, while the other used forward recursion. Can we be sure that the efficient version is equivalent to the original? We use a little eureka insight and structural induction to prove the equivalence.

Recall the two definitions given in Section 4.1.6:

Prove that rev xs = reverse xs, ∀ xs.

Note that reverse does nothing other than to provide an interface to the workhorse, rev’. This suggests that we need to establish an appropriate equivalence between rev and rev’. We get rid of the reverse (by substituting its right side for the left) and set out to prove that rev xs = rev’ xs [], ∀ xs.

An attempt to use induction often fails because the hypothesis is not general enough (is too weak) to establish the proof. This is the case here. One hint is to go back to the goal and look for a more general relationship (a stronger statement). (Here comes the eureka step.) Our hypothesis is just a specialization of a more general relationship between rev and rev’, which we use instead:

H(xs) : rev xs ++ ys = rev’ xs ys

(Recall that the precedence rules interpret this as (rev xs) ++ ys.)

Proving this will allow us to derive the original by setting ys to [] and noting that [] is the identity for ++. We proceed by structural induction on xs.

Required to prove: The generalized relationship H(xs): rev xs ++ ys = rev' xs ys

rev [] = [] rev (x:xs) = rev xs ++ [x] reverse xs = rev' xs [] where rev' [] ys = ys rev' (x:xs) ys = rev' xs (x:ys)

67

Base case: Does H([]) hold? rev [] ++ ys = LHS of H([]) = [] ++ ys rev.1 = ys ++.1 = rev' [] ys rev'.1 backwards, Which is the RHS of H([]). This establishes the base case. Inductive case: Does H(xs) imply H(x:xs) ? Assume H(xs) and show that rev (x:xs) ++ ys = rev' (x:xs) ys, ∀ ys rev (x:xs) ++ ys = LHS of H([]) = (rev xs ++ [x]) ++ ys rev .2 = rev xs ++ ([x] ++ ys) is associative = rev xs ++ (x:ys) The singleton law was not formally proved! = rev' xs (x:ys) We've managed to use the hypothesis! = rev' (x:xs) ys rev' .2 backwards.

This shows the left-hand side of the inductive hypothesis can be rewritten to the right-hand side, so we've completed the proof.

9.4 Exercises 1. Prove the singleton law for ++.

2. [] is an identity for ++. Its left identity property derives directly from the definition ++.1. Prove that [] is also a right identity, i.e. xs ++ [] = xs.

3. Prove the hypothesis that

take n xs ++ drop n xs = xs ∀ n ≥ 0

Hint: Here the induction has to range simultaneously over n and xs. Two base cases must be established: n = 0 , ∀ xs xs = [], ∀ n

The hypothesis should be extended simultaneously in both variables, i.e. we need to establish the inductive case for (n+1) (x:xs).

4. This chapter stresses the close relationship between induction proofs and recursive formulations of a function. In the past, students have often left out one of the base cases (typically the n == 0 case) in the above proof. If the case catering for n == 0 was omitted from the definitions of take and drop, under which conditions (precisely), would the following identity still hold, and under which conditions would it fail to hold? take n xs ++ drop n xs = xs ∀ n ≥ 0

What about if the n==0 base case was provided for, but the base case

68

for the empty list was not present?

5. Prove that reverse (xs ++ ys) = reverse ys ++ reverse xs.

Hint: When we have more than one definition of a function, the backward recursive cases are usually easier to use in the proofs, since the formulation of our backward recursion is closer to the induction. Since rev and reverse have already been shown to be equivalent we can prove the simpler form

rev (xs ++ ys) = rev ys ++ rev xs. The keen theorem prover may like to try the more challenging proof using the forward recursive definition directly.

6. Section 4.3.2 gives a doubly recursive and a more efficient version of the Fibonacci numbers. To show that these are equivalent it is useful to establish the following lemma:

fib_aux n p q = fib_aux (n-1) p q + fib_aux (n-2) p q Prove the lemma.

7. Use the lemma proved in (6) to show that the two programs for Fibonacci numbers are equivalent.

8. Cost functions often grow according to the sequence 1+2+3... Use induction to prove

( )

21

1

+=∑

=

nnin

i

Can you find a simpler way than induction to prove this identity?

9. The function all f xs returns True iff f x à True for all x ∈ xs.

For example, all odd [1,9,11] à True, but all even [1,2,3] à False.

Write the function all, and prove the identity all p (x++y) = (all p x) && (all p y)

10. The function any f xs returns True iff f x à True for any element in x ∈ xs. Write this function, and find and prove the identity corresponding to the one in the previous question.

11. The elem function was given as an exercise in Section 4.4. Prove the identity elem e (x++y) = (elem e x) || (elem e y)

9.5 Key Points • Functional programs are mathematical objects, and can be rigorously

69

manipulated.

• Function definitions provide the axioms.

• There are a number of simple identities and algebraic laws.

• Direct proofs using the laws are possible.

• Structural induction is useful in proofs that involve lists or other recursive data structures like the trees in the previous chapter.

• There is often a close match between the form of the recursive program, and the form of the inductive proof.

• Generalizing the hypothesis is sometimes the key to finding a good proof.

70

10 More about Functions

10.1 Some Notation The notation x::T indicates that object x has type T. A function has a type expression of the form D à R, indicating a mapping between elements of the domain D and the co-domain R. A total function is defined for all elements from its domain, while a partial function may only be defined for some domain values. We have seen examples of partial functions, such as those which are only defined for even-length input lists, or only for non-negative numbers. In functional programming, functions are usually understood to be partial.

An identifier which occurs as a parameter of a function is said to be bound in the function. Identifiers which do not occur as parameters are said to be free with respect to that function, or non-local to that function. They will, of course, be bound at some enclosing level. (The global system identifiers are all defined implicitly at the outermost level.)

A function which operates on an elementary data item and produces another elementary data item is a first-order function. A higher-order function is a function which takes or delivers another function as its argument or result. Integration and differentiation are common examples of higher-order functions: they accept functions as arguments, and deliver functions as results. Once classification of languages (or formal systems in general) is whether they are first-order (allow only first-order functions), or higher-order systems. The two classes have fundamentally different expressive powers and properties.

Sometimes higher-order functions are called functionals, to distinguish them from ordinary functions: the terminology is most useful if the system being described treats the first- and higher- order cases differently. In Haskell there is no sharp distinction, so we prefer to use the term function in all cases.

The previous chapters have dealt mainly with first-order functions, and new functions have been created by using the global and local definition mechanisms. This chapter discusses infix operators and then examines expressions which yield function-valued results.

10.2 Infix Operators

Infix operator notation is a syntactic sugaring mechanism that can improve readability. A general design principle of Haskell is that, as far as possible, all the built-in mechanisms should be also be available for user-defined functions. There is therefore a means of defining your own infix operators.

Infix operators and infix constructors are distinguished by their lexical syntax. The compoundable symbols which you can use to create your own tokens in Haskell are

71

!#$%&*+./<=>?@\^|~:. An operator in Haskell is either the minus sign, or an identifier which comprises one or more compoundable symbols. Any identifiers that begin with a colon denote infix data constructors rather than infix functions.

An operator can be used as if it was a prefix function by enclosing its token in parentheses. The opposite can also be done: a function can be used as an infix operator by enclosing the name in backquote apostrophes. So the following are equivalent:

elem (3*4) [1..10] ((*) 3 4) ‘elem‘ [1..10]

In order to determine the syntax of an expression involving operators it is necessary to know the precedence and the associativity of the operators involved. The operator declaration provides this information.

If an operator (or a variable identifier which is used as an operator by the backquoting convention) does not have an explicit declaration for precedence and associativity, it is assumed to be “highest” left-associative precedence.

Three keywords denote the three cases, infixl, infixr, infix. This example shows some fragments of a program which defines a power operator (^) and some operators for manipulating sets. (The ^ operator is already defined in the prelude.)

In the patterns for the definitions of the functions, the operator occurs between its parameter patterns. Precedence levels must be in the range 0..9, where a bigger number denotes higher precedence. Data constructors can also be declared to be infix, as we have already seen with the infix cons.

10.3 Currying Consider a prefix version of the binary addition function, (+), such that

(+) 2 3 à 5 Traditionally, we might have said that the function (+) is applied to two integer arguments, and produces an integer result. A typical type signature could have been

(+):: (Int × Int) -> Int // But this is not Haskell! Haskell treats all functions as if they are applied to their arguments one at a time. Recalling that -> associates to the right, the type signature can be re-written to reflect this notion:

infixr 8 ^ infixl 6 ùnion` infixl 7 ìntersection` (^) :: Int -> Int -> Int union, intersection:: Set a -> Set a -> Set a x ùnion` y = ... ... a ùnion` b ìntersection` c ...

72

(+):: Int -> Int -> Int The interpretation is that (+) takes just one argument, and returns a new function (of type (Int -> Int)) which expects one more argument, and yields a numeric result. For example, the result of evaluating ((+) 3) is a new function which adds three to its argument. The function is said to be partially applied, or curried, after the logician, Haskell Curry, (after whom Haskell was named), who first exploited the technique. The first argument has become “frozen-in”, and the resulting function can be passed as an argument, or applied to another argument. (Note that a partial function is not the same thing as a partially applied function.)

Related to currying is the principle of extensionality. Two functions f and g are extensionally equal if they produce the same outputs for all inputs. Thus, if

f x = g x for all x

we can “cancel” the x and denote this more succinctly as f = g

As a consequence, the following two definitions of inc have exactly the same meaning in Haskell:

inc x = (+) 1 x inc = (+) 1

10.4 Sections Infix operators can also be used with a single argument: the resulting construct is called a left or right section. Sections are always enclosed in parentheses, and represent functions which have had either their left or their right arguments already supplied. For example, (/ 3.14) expects a left argument, and divides it by 3.14. (3.14 /) expects a right argument, and divides it into 3.14.

Because the minus sign is used in maths both for unary and binary operators, e.g. for a-b and for –x, (a silly convention we ought to change – we could easily invent a differet way of expressing negation), the Haskell expression (- e) is an exception to the syntax rules for sections: this means negate e, it is not a section.

Note that the notation (*) which allows both operands to be omitted can also be regarded as a section — one in which neither operand is present.

10.5 Examples

These examples show that although functions such as double and inc are easy to define, we can just as easily do without them and use the curried expressions directly. These forms are most often used as arguments to the list processing

double = (*) 2 -- or (* 2) or (2 *) inc = (+) 1 -- or (+ 1) or (1 +) contains_10 = elem 10

73

functions, to which we devote considerable attention in the next chapter.

The type signature of (contains_10) can be deduced by noting that the type signature of elem is

Eq a => a -> ([a] -> Bool) The partial application (elem 10) instantiates the type variable a to Int and freezes in the first argument of elem, so the resulting type is

(elem 10) :: [Int] -> Bool.

Currying does have a difficulty. Consider the function to subtract one from its argument. We are tempted to write ((-) 1), but as soon as we apply this curried result to another argument we realize our mistake:

((-) 1) 10 -9 The difficulty lies with non-commutative operators, since currying can only freeze in the arguments from left to right, not freeze the rightmost one first. This non-symmetrical behavior spoils the usefulness of the notation in some ways. The following function can work around this difficulty:

The flip function accepts a function and reverses the order of that function’s first two arguments. Thus the expression (flip (-)) denotes a new version of the minus function that expects its arguments in reversed order.

The function flip is an example of a combinator. The field of combinatory logic studies functions which have no free variables. Although any function without free variables is technically a combinator, the terminology is usually reserved for a small set of higher-order functions that are of particular interest to Computer Scientists because they can be used as a kind of variable-free machine code. This implementation technique for functional languages was made popular by Turner[61], who designed the first viable implementation of a combinator-based compiler and reduction machine. (Students of combinatoric logic will note that functional programmers have renamed the combinators.)

Another combinator that is particularly useful is the const combinator, one which creates a constant function. Consider this definition:

10.6 Function Composition Function composition is an operation which combines two functions to create a new

const:: a -> b -> a const k x = k const 17 <anything> à 17

flip:: (a -> b -> c) -> b -> a -> c flip f x y = f y x

74

(function-valued) result. Composition is written as an infix dot, defined as

Consider the example filter (not . odd) [1..10] [2, 4, 6, 8, 10] The type signatures of the individual components in this expression are

odd :: Int -> Bool not :: Bool -> Bool (not . odd):: Int -> Bool (.) :: (Bool -> Bool) -> (Int -> Bool) -> (Int -> Bool)

Note that the first three functions are first-order, but function composition is a higher-order function. Since the arrow is right associative, the rightmost parentheses may be omitted. The second operand of the composition function must yield a type which matches the domain of its first operand.

The composition operation constructs a new, anonymous function from its two operands, not and odd. The result is a function that can be passed as an argument to the filter function.

10.7 Exercises 1. Give the type declaration for twice, and compute the results of each

application: a. twice f x = f (f x) b. twice inc 0 c. twice twice inc 0 d. twice twice twice inc 0

2. Define twice by using the infix function composition operator.

3. Use partial application to filter all elements in a list that are greater than 10.

4. Use the definitions of flip and . to show that (flip . flip) is the identity function.

5. In the Physical Sciences we can check to ensure that the units (i.e. time, mass, force) of the components in a formula combine correctly so as to be compatible with the expected result. In Computer Science this is analogous to a type-check. Use the type signatures of flip and . to verify that the type of (flip . flip) matches the type signature of the identity function.

6. For each of the expressions below, write down the instantiated type signatures of all the components, then combine the type expressions according to the form of the expression, and determine whether it is validly typed, and what the resulting type is.

infixr 9 . (.):: (a -> b) -> (c -> a) -> (c -> b) (f . g) x = f (g x)

75

a. 6 + 7

b. twice sqrt 16.0

c. filter (odd . (2 *))

d. if 6+7 == 8+5 then 1 else 2 (The if construct in Haskell is just syntactic sugar for a three-argument function, of which the second and third arguments must have the same type. Give the type signature for if.)

e. elem 10 [[1,2,3],[8,9,10]] 7. To illustrate that an equivalent of the if conditional can be written as a

standard function in Haskell we provide the following definition: if' True tp _ = tp if' False _ fp = fp fact n = if' (n == 0) 1 (n * fact(n-1))

a. Provide the type signature for if’.

b. Could you write such a function in a conventional language like Java, Pascal or C#? Demonstrate, or explain why it is not possible.

8. Prove that function composition is associative.

10.8 Key Points • Higher-order functions are functions that operate on or return other

functions.

• New infix operators can be declared, and there are notations to allow binary operators to be used as prefix functions, and vice-versa.

• All functions can be curried. A partially applied function returns another function as its result, so it can be regarded as being higher-order. This result is anonymous, in that it does not have a name.

• Infix functions have a special section syntax that allows either (or both) operands to be omitted from an expression. Such an expression represents a function which is to be applied to the missing operands.

• Some special higher-order functions called combinators are used often: these include combinators for composing other functions, for creating constant functions, and for swapping the argument order of other functions.

10.9 Built-in functions you should know const, flip, (.)

76

11 List Operators We have already touched on different forms of list operators in some of the previous material, but in this chapter we collect together the material in a consolidated form. The development is heavily influenced by Wadler[66, 67, 68] and Bird[9, 10, 11, 12].

The fundamental tenet of the list operator style is that computation is often simpler if done “object at a time” rather than single component at a time. The stress here is on the list as the unit of computation, rather than on the individual elements of the list.

11.1 The Basic List Operators Although there are a large number of potential operations that may be useful for manipulating lists, we concentrate here on a small number of elementary function types, and build up more complex operations in terms of these.

In this chapter we’ll regard the basic construction operator for lists to be ++, rather than the usual list constructor. Since either can be written in terms of the other, we lose no generality, but the symmetry of append (both its arguments are lists) simplifies matters slightly.

Generators, filters and the map function will also be regarded as fundamental building blocks. They can be reviewed from the previous material. In general, a generator creates a list object from one or more simple objects, a filter removes some elements of a list to produce another list, and map allows us to take a function which ordinarily would only operate on a single element, and to apply that function over a list of elements.2

The next class of list operator that we are interested in are the reduction functions. The following example implements right-associative reduction, and captures the essence of backward recursion.

The idea here is that foldr is given a binary operator, together with its right identity, and it combines all the elements of the list (from the right) into a single value by repeated use of the operator.

2It would be sensible to think of map as “strengthening” an elementary function like odd so that (map odd) is regarded as a more powerful version of odd that can operate on lists of Integral elements.

foldr:: (a -> b -> b) -> b -> [a] -> b foldr op i [] = i foldr op i (x:xs) = op x (foldr op i xs) foldr (+) 0 [1,2,3] à (1+(2+(3+0))) à 6

77

It is equally easy to capture the essence of forward recursion, or folding from the left. Recall that forward recursion usually carries an accumulating parameter into the recursive levels, and the final result of the function is the final value of that parameter.

The prelude defines companion versions called foldl1 (foldr1) which perform left (right) reductions on non-empty lists. They don’t have explicit identity arguments, but use the first (last) list element as base value. Their type signatures are accordingly simpler.

The reduction functions operate on a list of values to produce a single result. (This result could be a data structure, of course). By contrast, the scan family of functions do the same sort of thing, but they return lists of partial results. There are also four variants: scanl, scanl1, scanr, scanr1.

Notice that the result list in these two cases is one element longer than the input list: the identity element itself is also included as the first (last) element. The variants for non-empty lists do not include this case:

This completes the basic tools. We now examine combinations of them.

11.2 Examples of the List-Operator Style We begin with definitions for a few simple functions, defined in terms of list operators. For this we recall the const combinator:

const k x = k Firstly, we define sum, product, factorial and length in list operator style. The type signatures are omitted:

scanl1 (*) [1..] à [1, 2, 6, 24, 120, ..] scanr1 (+) [1,2,3,4] à [10, 9, 7, 4]

scanl (+) 0 [1,2,3,4] à [0, 1, 3, 6, 10] scanr (+) 0 [1,2,3,4] à [10, 9, 7, 4, 0]

foldl1 (min) [1,2,3] à ((1 `min` 2) `min` 3) foldr1 (max) [1,2,3] à 1 `max` (2 `max` 3))

foldl:: (a -> b -> a) -> a -> [b] -> a foldl f z [] = z foldl f z (x:xs) = foldl f (f z x) xs foldl (+) 0 [1,2,3] à (((0+1)+2)+3) à 6

78

The definitions of sum and product do not explicitly name their list argument, since the principle of extensionality (see Section 10.3) allows us to omit it. The leng functions operate by replacing each element in the list by a 1 (this transform is done by mapping the constant function 1 onto the elements of the list), then summing the result.

Next we examine another version of the elem function. To determine whether an item is a member of a list, map an equality predicate onto the list and reduce the result using ||, and its identity False.

Is this very inefficient? Not under lazy evaluation, since the outermost reduction will only force the mapping as far as is required. The following example illustrates this:

elem 15 [1..] à True Does reduction always have to reduce to a simple item? Not necessarily. Reducing a list under the construction operator just returns the list unaltered. Reducing under some kind of structure-building insertion operator will build all elements in the list into the structure:

We might like to try the previous example using forward recursion instead: this means we need a version of insert that takes its arguments the other way around. Fortunately, (flip insert) is just such a function:

The previous version builds a sorted list, but other versions of insert have been used to build trees. Suppose insertInTree puts its first argument into a tree. Turning a

sort xs = foldl (flip insert) [] xs

-- A folding function can return a list foldr (:) [] [1..10] à [1,2,3,4,5,6,7,8,9,10] sort xs = foldr insert [] xs

elem e xs = foldl (||) False (map (== e) xs) elem 4 [1,2,3,4,5,6,7,8] à True elem 17 [1,2,3,4,5,6,7,8] à False

sum = foldl (+) 0 product = foldl (*) 1 fact n = product [1..n] leng xs = sum (map (const 1) xs) leng2 = sum . (map (const 1)) sum [1,2,3,4] à 10 product [1..4] à 24 fact 6 à 720 leng [1,4,2,6] à 4 leng2 [1,4,2,6] à 4

79

list into a tree would simply become

We now examine the qualifiers all and any. The first returns True if all elements satisfy some predicate, while the second returns True if any element satisfies the predicate:

The reduce/map pattern in each of these cases can be abstracted out as a useful function in its own right, and the definitions for some of the functions above can be made even more terse:

The scan family are useful for creating lists of values.

11.3 Homomorphisms What functions can be written as combinations of foldr and map?

To examine the nature of lists we can define a simple algebra: the domain of the algebra is the set of all possible lists, and the only operation is the binary operator ++. The empty list is an identity element in this algebra.

For the record, an algebra with only one binary operator is a binary algebra. A binary algebra in which the operator is associative is called a semigroup. A semigroup which has an identity element is a monoid, and a monoid in which every element has an inverse element is a group. Our list algebra is therefore a monoid, but not a group,

-- One way to produce a list of all factorials facts = scanl (*) 1 [1..] facts à [1,1,2,6,24,120,...]

-- Looking for even more commonality h op id g l = foldr op id (map g l) elem e = h (||) False (== e) all = h (&&) True any = h (||) False leng = h (+) 0 (const 1)

all p xs = foldr (&&) True (map p xs) any p xs = foldr (||) False (map p xs) all odd [1,3,5,7,9,11] à True all odd [1,3,6,7,8,11] à False any even [1,3,5,7,9,11] à False any even [1,3,5,6,9,11] à True

-- Here fold is used to build a tree! listToTree xs = foldr insertInTree EmptyTree xs

80

since lists do not have inverses under the ++ operator.

By characterizing our lists as a mathematical algebra we can call upon a powerful notion of “equivalence” between algebras. If we can map one algebra V1 (our list

algebra, in this case) into another algebra V2 (say the algebra of integer numbers

under addition, with the 0 identity), so that the combining operation in V1 (++ in our

case) is “mirrored” by the combining operation in V2 (+ in this case), then V2 is a

homomorphic image of V1 . The function that maps between V1 and V2 is called a

homomorphism.

More formally, a function h is a homomorphism from the list algebra if we can find some associative combining operator, say θ, so that

h(x++y)=(hx)θ(hy)

for all lists x and y, and additionally, θ must have an identity element e, and h[]=e

We are now in a position to answer the question posed at the beginning of this section. We do so without proof. A function h can be written in reduce/map form, i.e.

h xs = foldr <op1> <u1> (map <op2> xs) if and only if h is a homomorphism over the list algebra[9].

As an example, the following version of leng is written in reduce/map form, therefore it is a homomorphic mapping of the list algebra into the algebra defined over the natural numbers with addition as the operator and 0 as the identity.

leng maps lists to integers: The two properties required of a homomorphism are satisfied, and have been used previously as laws of Haskell:

The homomorphism property essentially requires that the operation h on the whole can be stated in terms of some combination, θ, of the operation on the component parts. With this in mind, it is simple to find operations that are not homomorphisms — just find one that does not depend on the individual elements, but on their occurrence in the same list. For example, a function

dups: Eq a => [a] -> Bool which returns True if the list contains any duplicate elements and False otherwise,

-- The homomorphism relationships between lists and leng -- is explicit in our previous definitions leng [] = 0 leng (x ++ y) = leng x + leng y

-- Are we still finding the length of a list? leng l = foldr (+) 0 (map (k 1) l) leng [1,4,2,6] à 4

81

is not a homomorphism on lists, and therefore cannot be written as a reduce/map list operation. If it was a homomorphism it would have to possess all these properties, which are clearly contradictory:

11.4 Listless Style We now turn our attention to the efficiency of these methods. Wadler[67] notes that there is an inherent inefficiency in many of the list operator combinations. Consider the definition of

The problem is that the generator constructs the intermediate list which is immediately reduced back to a single item. Wadler calls this a listful style of programming, and pinpoints the generation of the (unnecessary) intermediate list, and its subsequent traversal, as the major source of inefficiency. He proposes a source transformation that generates a listless version of the program. Rather than tackle a full-blown optimizer, Wadler’s system depends on the compiler having explicit knowledge about a small number of the basic list operators, (i.e. map, filter, reducers, generators) and the ways of combining them. Since the transformed program does not have to build as many lists, it uses fewer memory cells. The transformation therefore shifts part of the garbage collection overhead to compile time.

Note that the definition of h in Section 11.2 is listful: it uses map to build an intermediate list, then immediately uses foldr to compress the list into a single result. All the functions defined in terms of h therefore suffer from this inefficiency. Capturing the commonality has allowed us to concentrate our optimizations for the whole class into one place: the function h. The new listless version of h, which is claimed to do the same thing, is

To prove that this new one does exactly the same as the old one did, we use structural induction on xs.

h' op id g [] = id h' op id g (a:xs) = op (g a) (h' op id g xs)

-- Factorial again fact n = foldr (*) 1 [1..n] fact 6 à 720

-- Why dups cannot be written in fold/map style: dups [x] = False dups [y] = False dups ([x] ++ [x]) = dups [x] <op> dups [x] = True dups ([x] ++ [y]) = dups [x] <op> dups [y] = False

82

Required to prove H(xs): h' op id g xs = h op id g xs

Case H([]): h' op id g [] = id (h'.1) = foldr op id [] (foldr.1) = foldr op id (map g []) (map1) = h op id g [] (h.1)

Case H(a:xs) h' op id g (a:xs) = op (g a) (h' op id g xs) (h'.2) = op (g a) (h op id g xs) (assumption) = op (g a) (foldr op id (map g xs)) (h.1) = foldr op id ((g a) : (map g xs)) (-foldr.2) = foldr op id (map g (a:xs)) (-map.2) = h op id g (a:xs) (-h.1)

11.5 Exercises 1. Prove that (map f) distributes over ++, i.e.

map f (xs ++ ys) = (map f xs) ++ (map f ys) 2. Prove that map distributes over composition, i.e.

map (f . g) = (map f) . (map g) 3. Under what conditions do left and right reductions yield the same results?

4. Under what conditions is left reduction equivalent to right reduction of the reversed list, and vice versa?

5. It may come as a surprise to learn that map can be written using foldr and functional composition. Prove the identity

map f xs = foldr ((:) . f) [] xs 6. Prove the identity

foldr f id (map g xs) = foldr (f . g) id xs 7. Prove the identity

foldr f (foldr f id ys) xs = foldr f id (xs++ys) 8. A listful function to determine how many elements of a list satisfy some

criterion can be defined as hits p xs = length (filter p xs)

a. Write an equivalent listless function.

b. Prove the equivalence.

9. Write a function map2 which allows us to apply binary functions to two lists of

83

arguments. For example, map2 (+) [1,2,3] [4,5,6] à [5,7,9]

(Assume the operator is applied up to the shorter of the two input lists.)

10. Give a type signature for the function h in Section 11.2

11. Give a type signature for this function, and determine what it does. f = foldl g [] where g x y = y : x

11.6 Key Points • List operators operate on whole lists at a time.

• This can lead to higher abstraction levels for the programmer.

• The key list operator functions are map and foldr.

• Many functions can be written using a particular combination of map and foldr. We capture this pattern as a new function in its own right.

• The family of functions expressible in this way can be characterized by treating lists and the append operator as an algebra, and using the notion of homomorphisms between algebras.

• The reduce/map style is listful (generates temporary intermediate lists), with some inefficiency.

• Transformations exist to optimize listful functions into a listless style.

11.7 Built-in functions you should know foldl, foldr, foldl1, foldr1, all, any, scanl, scanr

84

12 Lazy Evaluation

12.1 Introduction One of the features of Haskell is that it is a lazy language. Laziness means that we don’t compute anything unless we absolutely have to, typically because the user wants to print a value or use it’s result for some other computation. The system consistently attempts to put off any computation until it cannot be postponed any longer. The mechanism is sometimes called call by need.

The idea essentially revolves around the order in which function applications are executed. We have mentioned that functional programs yield values that are independent of the order in which the subexpressions are reduced. The statement is only true up to a point. If two distinct reduction sequences both yield a result, the Church-Rosser theorem assures us that the results must be identical. But it is also true that one particular sequence of reductions may fail to terminate, while another will terminate. Even if they do both terminate, one sequence may take more steps than another.

The idea of non-terminating computation leads to some theoretical difficulty with the notion that every function application returns a value. We take non-termination into account in the theories by defining a new special value, ⊥, pronounced bottom. The bottom value is a value which is assumed to belong to every domain – so it can be returned by any function. For the purposes of the theory we say that the non-terminating function returns bottom.

Some authors extend this notion so that any function which produces an error, (say division by zero), is also assumed to return bottom for those cases. But not everybody agrees that this is sensible. Bottom normally indicates “nothing is known about the value”, whereas an error at least tells us something about the result. The purists argue that bottom is not an appropriate value if something, even the fact that there has been an error, is known. (I think the purists have a good case. If your program hadn't finished running by the end of the weekend, it certainly sounds different from "it crashed after two minutes". So to build into your theories that non-termination and crashes are the same result seems a bit questionable!)

But to more concrete matters: consider this example, where g is an unspecified function that may be extremely expensive (perhaps infinitely expensive) to calculate:

The function f builds a two element list, and the invoking expression prints the first element of that list. In the process, the value of (g n) is not used at all. An eager, or call-by-value evaluator would attempt to fully compute the components of the list before calling the head function to discard them, thereby doing unnecessary computation, and exposing the system to the possibility that (g n) may be non-

f n = [2*n, g n] head (f 10) à 20

85

terminating, or erroneous. On the other hand, a lazy evaluator would tackle the problem in an outermost fashion by attempting to find the head of the expression, calling for that expression, getting back the list [2*n, g n], (with as-yet unevalued components), and taking the head of this list which would be (unevaluated) 2*n. The lazy evaluator then calculates 2*n and prints the result, and the work for evaluating (g n) is never even attempted.

Outermost (lazy) reduction can be shown to find a solution whenever one exists. For this reason some consider it semantically better than eager reduction. Our concerns in Haskell are not so much to consider the merits of these arguments, but to see how the lazy mechanism helps the programmer.

12.2 Unbounded Objects can Simplify Matters We begin by looking at a simple example — the function that generates an infinite list of twos.

In this example the system (conceptually) generates an infinite list [2,2,2,...]. Since the system is lazy, the operation is safe provided that we do not attempt to access the end of the list, or do things like count the number of components. take will only probe into the list as far as is necessary, and in so doing will force that portion of the list to be explicitly created.

A particularly useful infinite list is denoted by the syntax [1..]. By omitting the upper bounds from the generators studied in Section 6.1, we generate infinite lists (lazily, of course).

Does the programming freedom to work in terms of infinite lists simplify any algorithms? Consider the well-known sieve algorithm for generating prime numbers:

The algorithm starts by sieving the infinite list of integers starting from 2. Each invocation of sieve takes the head of the remaining list as a prime, and uses a list comprehension to filter all multiples of p from the rest of the list. (The limited machine capacity gets in our way quite quickly of course, but we abstract over that small difficulty.)

If we are asked to generate the first n primes we can do so by simply taking the first n elements from the infinite list of primes.

sieve:: [Int] -> [Int] sieve (p:x) = p : sieve [n | n <- x , n `mod` p /= 0] primes = sieve [2..]

twos:: [Int] twos = 2 : twos sum (take 10 twos) à 20

86

take 100 primes à [2, 3, 5 ... 541] Because of the laziness, this does no work other than that required to compute the 100 primes. By contrast, in order for the usual (array-based) algorithm in a conventional language to do the minimum amount of work, we would need to know beforehand that the 100th prime is 541, so that the programmer could limit the sieve and the size of the array. The “obvious” way of coding the solution in an eager language therefore requires more information than is needed in the lazy case.

12.3 Laziness is a Decoupling Mechanism This section demonstrates that we can do away with conventional backtracking in a lazy language. Backtracking is an implementation strategy for optimizing searching procedures. It allows an efficient depth-first tree search over some search space. If no solution is found the system returns to a previous level in the tree and tries an alternative path. Instead of a depth-first search we could implement a level-by-level search. Most applications that use backtracking could just as well use a breadth-first search, except that it is often too slow to generate all solutions at one level before moving forward to the next level of the problem. We attempt to demonstrate here that a level-by-level search is often a conceptually cleaner solution. But even if we program the search as a breadth-first, the lazy system executes it in depth-first order, since it consistently puts off those parts of the computation that can be delayed. The result is that the programmer can realize the efficiencies associated with the depth-first search, but without having to explicitly program the backtracking. To illustrate we present a solution to the eight queens problem.

The problem is to place n queens on an 8×8 chessboard so that no queen attacks another. The rows and columns of the board are numbered from 1 to 8, and placing the queens generates a result of the form [p1, p2, ... pn] where each pi is an integer

giving the row number for the placing of the queen in column i. The solution is recursive in nature: to place n queens, first find all solutions to the problem of placing n-1 queens, and for each of these solutions, try queen n in every possible position until a safe position is found. In this solution the queens are tried from the right hand side of the board.

87

The program combines a number of features that we have seen before. The generator b <- queens(n-1) generates all solutions to the smaller subproblem. A current position for q is drawn from all possible positions, and checked for safety.

In this example the two generators in the list comprehension are independent of each other and can occur in either order. Calculating all the possible placements of n-1 queens is substantially more work then testing the current queen at positions 1 through 8, so we would be wise to ensure that the bigger subproblem does not get solved eight times. In a conventional language one would need to make sure that the cheap operation was done in the inner loop. The guaranteed absence of side-effects in a functional language ensures that every expression can be safely moved out of any scope that it does not depend on. This is known as lambda lifting[49]. A clever compiler will lift the expressions so that one is not mindlessly embedded inside the other when they are independent, and the program does (more or less) the same amount of work in both cases. (Hugs is a lightweight interpreter: it doesn’t do any very sophisticated transformations such as this.)

Watching the operation of the two cases is informative, and gives some intuitive feel for lazy evaluation. If the expensive task occurs to the left of the cheap task it gets put off, so the first solutions occur much sooner, and quite regularly thereafter. If the cheap generator occurs to the left, it gets put off, while the hard work gets done at the outset. It takes longer to find the first solution, but once the ground work has been done the other solutions occur quite rapidly.

12.4 Discussion and Examples Hughes[41] argues convincingly that lazy evaluation is a vital tool for promoting modularity and good design. It essentially gives the programmer all the flexibility and power of coroutines, without any of the usual syntactic and semantic overheads. Consider a simple generate-and-test sequence over a search space. Conventional solutions tend to fold the testing steps into the state generator, so that the generate/test steps can be conveniently interleaved. A preferable solution is based on the producer/consumer model, where one module is responsible for generating the states, while another is responsible for filtering them. In this way both parts become

type Board = [Int] queens:: Int -> [Board] safe :: Int -> Board -> Int -> Bool queens 0 = [[]] queens n = [q:b | b <- queens(n-1) , q <- [1..8] , safe q b 1] -- is q safe on b hdist cols away? safe q [] hdist = True safe q (a:x) hdist | q == a = False | q-hdist == a = False | q+hdist == a = False | otherwise = safe q x (hdist+1)

88

re-useable independently of one another, and it is easy to take advantage of the the usual consumer/producer opportunities for parallelism. The complexity of synchronizing the two activities falls away completely in a lazy regime.

In recent times C# has adopted some new syntax based on this idea. If you are a C# programmer, check out the yield keyword. (Its not the first time we've seen this in conventional languages, coroutines have been around for a long time, but it seems that their syntax and ease of use is finally approaching what lazy functional programmers have taken for granted.)

Although lazy evaluation has introduced a powerful new tool into the programmer’s toolkit, it is not without some difficulties and surprises. At this time we have a limited understanding of some of the issues related to lazy evaluation, and a lazy implementation often violates our expectations of an algorithm’s performance. We have already seen in the previous section that an algorithm which was programmed essentially as a breadth-first search executed in depth-first fashion. These behind-your-back surprises can make things tricky.

As our next example we examine three functions based on Turner’s KRC[62] language. The first inserts an item into an ordered list. The second is a sort by repeated insertions. The third function finds the minimum of the list by sorting the list, and taking the first element.

The surprise is that although the performance of the insertion sort is O(n2), finding the minimum as the head of the sort of an unsorted list, executes in linear time.

Say that again! (sort xs) is an O(n2) algorithm, but (head (sort xs)) executes in linear time!

To see why this is the case, recall that lazy computation is only done when the result is needed. Because minl never probes the list beyond the first element, it never forces insert to complete what it was asked to do. The truncated version of insert that actually reflects the computation is effectively

insert:: Ord a => a -> [a] -> [a] minl :: Ord a => [a] -> a sort :: Ord a => [a] -> [a] insert a [] = [a] insert a l@(b:u) | a < b = a:l | otherwise = b:insert a u sort [] = [] sort (x:xs) = insert x (sort xs) minl list = head(sort list) minl ["joe","sam","bill","evelyn","thandi"] à "bill"

89

In this form we can see that this function merely returns the minimum of a and the item at the head of the list. Sort repeatedly calls this function to deliver the minimum of the whole list, in linear time.

Here laziness has worked to our advantage. But it is not hard to produce examples where the opposite is true. For example, to find the maximum we could either modify the insert to create a descending list, or we could extract the last element of the sorted list. The performance in the two cases is significantly different.

Another advantage was hinted at in previous exercises where we determined whether a number was prime by finding a list of all factors, then testing if the list was empty. There the lazy execution ensures that the full list of factors need not be built – as soon as the first factor is generated the test for emptiness can terminate.

There is a price to be paid for lazy evaluation. One major source of inefficiency is in the overheads that accrue from delaying expressions that must eventually be evaluated. Consider this code for fact:

There is no point in postponing the calculation of n-1 in the second leg, since the value will be needed to do the pattern match against 0 in the next call of fact. Putting off the computation, and then being forced to do it anyway results in unnecessary overheads. Typically, the calling environment and the expression to be delayed have to be packaged up into a closure which is passed to the called function. The literature[20, 54] proposes a form of strictness analysis in which the functions are analyzed to determine which arguments will always be evaluated. The function is said to be strict in these arguments. These can be evaluated eagerly (or by separate processors) so as to avoid the overhead of delaying them. Better code can also be generated in the function if the arguments are known to be in reduced form. Of course the caller and the callee must observe the same conventions: if the caller passes an unevaluated argument the callee must evaluate it before use.

It gets really tricky when functions are called indirectly, via a higher-order function like filter, which knows nothing about which function it is calling. Since it cannot make strictness assumptions, it must generate code for the most general cases. Consequently, even functions with strict arguments will get called lazily in some cases. This forces the called function to expect the worst and be prepared to evaluate all its arguments as if they were lazy. Fortunately, an extra round of evaluation can do no harm to an already evaluated value, (but it might slow things down). This is because the result of evaluating an expression is a fixed-point of the evaluator.

(Refresher: a value x is a fixed-point of a function f iff (f x) = x. This property means that it does not matter how many times f is applied, the same result is

fact 0 = 1 fact n = n * fact (n-1)

insert a [] = [a] insert a l@(b:_) | a < b = a : < work never done > | otherwise = b : < work never done >

90

produced.)

Another related source of inefficiency in lazy evaluators has recently attracted some attention [42, 54, 69, 72]. Lazy programs sometimes exhibit space leaks, a phenomenon that causes them to consume more storage (sometimes a worse order), than might be expected from an eager reducer, or from the intuitive formulation of the problem. More importantly, it has been shown that for some problems any lazy formulation will be subject to space leaks unless some form of parallel reduction is employed[42]. One instance of the problem arises when the evaluator delays the computation of an expression such as (head xs). The delayed expression might contain the only active reference to xs, so that forcing its computation would potentially free any structure referenced by the tail of xs. Delaying the computation forces the system to hang onto this tail. Wadler[69] plugs some of the leaks by having the garbage collector detect and evaluate these special cases. It is interesting to note that by coupling the extra evaluation logic into the garbage collector the system simulates some form of parallelism – the reductions done by the collector are not coupled to the normal sequential progress of the computation, and this small amount of “pseudo parallelism” is enough to fix this class of space leak.

The space leak problem appears to be symptomatic of a more general source of inefficiency: that postponing computations, even those that might never be needed, might involve more effort than doing the work in the first place. There are no definite answers to these questions at this time.

Finally we comment on debugging of programs. One of the difficulties with any program transformation scheme (including traditional compilation) is that although the transformed program has the same input/output behaviour as the original, the execution model and sequence of steps may disagree with the programmer’s conceptual model of what is occurring. Extracting sensible history information from an erroneous computation state is a difficult task – some computations have been delayed, some forced, and in an order that is often counter-intuitive. More work is needed to create good debugging aids in the presence of lazy evaluation, coroutines and parallelism.

12.5 Exercises 1. The nfib function below is widely used by implementors as one measure of

the speed of function calling in their implementations. How is the result of nfib related to the number of nfib calls? Test your functional language implementation (and some other languages) and compute the numer of calls per second.

2. Consider the Quicksort algorithm presented in section 6.5. Will the

expression

nfib:: Int -> Int nfib n | n <= 1 = 1 | otherwise = 1 + nfib(n-1) + nfib(n-2)

91

head (qs xs) execute in linear time?

3. Most functional programming texts tell us that the primitive arithmetic operators are strict in all their arguments. Consider this definition of myTimes

Construct a program that works using myTimes, but fails when the built-in primitive (*) is used.

4. In each of these definitions, give the type declarations and identify which arguments are strict.

5. For more than a century it was conjectured that any planar (flat) map could be

coloured with no more than four colours, so that adjacent countries always had different colours. The proof that four colours are indeed enough has only been discovered recently. Devise a representation for a map (the only relevant information is which countries border on which others), and write a program to find a colouring for an arbitrary map.

6. A program from Coxhead[22] solves a little puzzle. There are a number of playing pieces occupying some of the positions on a one dimensional board. The program must find any (or all) sequences of moves from some starting state to some goal state. The puzzle is intended to demonstrate the main ideas from searching – i.e. state spaces, transition or successor functions, keeping a path, breadth first, depth first and best first searching, tree pruning and heuristic

abs n | n >= 0 = n | n < 0 = - n all pred [] = True all pred (a:u) = (pred a) && (all pred u) any pred [] = False any pred (a:u) = (pred a) || (any pred u) filter p [] = [] filter p (a:u) | p a = a : filter p u | otherwise = filter p u map f [] = [] map f (a:u) = f a : map f u elem e [] = False elem e (a:u) = (e == a) || elem e u foldr op id [] = id foldr op id (a:u) = op a (foldr op id u)

myTimes 0 _ = 0 myTimes n m = n * m

92

estimators of closeness to the goal. The game rules are that a piece may move forward to the next empty square, or it may jump forward over an adjacent piece, provided that if it jumps, it must jump as far as possible in one move (like the jump rule in checkers). If we start with the configuration art... (where the dots indicate open board positions), and the goal is ...tar, some possible paths to the solution are

Write a program which, given a starting state and a goal, finds all possible solutions.

12.6 Key Points • Call-by-need (lazy evaluation) only evaluates expressions on demand.

• Because we do not actually build the structures unless they are needed, we can process the front portion of infinite lists or structures.

• Infinite data structures can simplify our approach to solving some problems.

• In a sense, laziness gives us some of the power of coroutines and backtracking without having to explicitly program for it.

• Performance of lazy algorithms may be difficult to predict, and counter-intuitive.

• Space usage and debugging may also suffer.

art... a.tr.. .atr.. .at.r. .a.tr. ..atr. ..at.r ...tar art... a.tr.. .atr.. .at.r. .a.tr. .a.t.r ..at.r ...tar art... a.tr.. .atr.. .at.r. .at..r ..ta.r ..t.ar ...tar art... a.tr.. .atr.. .at.r. .at..r .a.t.r ..at.r ...tar

93

14 Stateful computation through Monads

14.1 We need stateful computation too! Although the no-side-effects and order-of-computation-does-not-matter philosophy of functional programming is elegant and powerful, real programs need to create side effects! You need to write files to disk, update your bank balances in the database, or manage the window-layout arrangement on your screen. And if you're writing lines of output to a file, the order is important!

Oops! What do we do now? Haskell researchers came up with an interesting compromise:

Let's permit side effects and explicit sequencing for some computation, but we'll keep a strong firewall between the parts of the program that are manipulating state (usually in the external world), and the pure functional parts. If the stateful bits of the computation cannot "leak" back to the purely functional parts, we'll have the best of both worlds. Our type system can enforce the firewall.

The Haskell program below expects two arguments (representing filepath names) on the command-line when it runs, and it copies the file, writing a new file. We'll examine the detail as the chapter progresses: the fragment is presented upfront as an example of what we're working towards.

14.2 How did they do that? Imagine you start with an elegant, pure pipeline computation: (just a few steps, shown here with the data moving left-to-right through the computation):

One way to "augment" this system for manipulating state would be to add an extra input – the current system state – to each function, and get an extra value – the new

import System main :: IO () main = do [src, dst] <- getArgs contents <- readFile src writeFile dst contents -- Run as :main "src.txt" "dst.txt"

94

system state – out of each function. It requires much "plumbing": instead of passing a single stream of values through the pipeline, each pipeline stage now consumes a pair (value, state) , and produces a pair (newvalue, newstate) .

Haskell has enough syntax machinery so that this extra state that has to be plumbed through the pipeline can mostly be hidden from the programmer. So as you can see in the fragment of code given above, the designers deliberately made the syntax look imperative and sequential, to give a familiar feel to conventional imperative programmers, and to highlight and emphasize the differences between "classic pure Haskell" and this new mode of working.

The next trick is to "externalize" the state, so that we can make it mutable: each stage of the pipleline now has the ability to access or read the external state (e.g. read a file, get command line arguments, or to make modifications to the external state, e.g. by writing a file, or deleting it from the filesystem.

The data type that carries the state through your pipeline behind your back is a kind of Monad: the term has quite deep roots in a branch of mathematics called Category Theory. We're not going there in these notes.

In Haskell, a Monad is a class. So when we define a new data type (e.g. perhaps a BinarySearchTree), we can make it monadic (capable of carrying extra hidden state) by implementing the handful of functions required to support the Monad class.

Looking again at the diagram above, we now have two inputs and two outputs from each stage of the pipeline, but we plan to keep the lower plumbing hidden. Each stage is a two-pronged "pluggable component". We'll almost always want the data flowing through the upper level to be polymorphic, so our new monadic data type will be parameterized by a type parameter.

(Use the Hugs help to examine the classes, look up Monad, find its four methods, and the three instances of Monad that are already defined in the prelude.)

95

14.3 The type (IO a) is a Monad The first monadic type we're going to work with here is already defined in the prelude for us: (IO a). It represents a computation pipeline that will eventually return a value of type a. But along the way we can carry (and manuipulate) the state of the file system and machine environment, so our pipeline stages can perform input and output, and can change objects in the file system. And there is is free bonus for this particular type: the main function in a program, the one that runs when you launch a Haskell program from the command line, is already of type IO(). So when main starts up, its initial encapsulated state is that of your computing environment - your file system, environment variables, command line arguments.

If you look at the diagram above, we've labelled the bubble the IO. The type declaration for IO means that each function we define of the type IO has its extra connectors and ability to plug into IO. There are other monadic data types, and you can define your own too: but they are all distinct from each other, and only functions of type IO <something> can maniplate the state encapsulated within the IO bubble.

A function that is of type, say, b -> IO [Int], says to us "besides propagating the IO state and being able to manipulate it on my lower prongs, my top prongs expect an input of type b and produce an output of type [Int]". (The category theorists are excused for cringing at this explanation!)

Not all functions used in the pipleline need to plug into or make use of the state. So ordinary, non-monadic functions can be used too. And not all monadic functions need to take values from their pipleline predecessor, not do they have to deliver outputs to their pipleline successor. So it is possible to build a computational pipeline that could look like this:

In this case, the first step a might read the contents of an external file into the program, stages b and c are ordinary Haskell functions to manipulate the string (perhaps to encrypt the string), and the final stage d would write the results to a new external disk file.

So a and d are monadic functions of type IO <something> whereas functions b and c are just ordinary Haskell functions that don't know or care about the external state.

Depending on the energy of the library implementors, there are a number of useful functions that can manipulate this state – e.g. by writing files to disk.

The remainder of this chapter is going to show some examples of using the IO data type in Haskell.

96

14.4 Examples

14.4.1 Hello! Here is a complete simple interactive example that we could use with absolute beginner programmers. It uses some console-based input and output.

To run this program using Hugs you have two options

i. From within the Hugs interpreter, type (notice the leading ':') :main

ii. Directly from the DOS command line, (assuming your path is set up to find runhugs) type runhugs progname.hs

14.4.2 List a file to the console Our next example lists the contents of a file to the console.

A couple of points to note:

• The do construct is layout sensitive and allows us to code "statements" sequentially. It is the syntactic sugar that sequences the stages.

• This time we gave the type signature for main explicitly. The data type IO takes a type parameter. main always has type IO ( ) where the ( ) notation denotes the so-called unit type in Haskell: this is loosely similar to Java or C#'s keyword void.

• The getArgs function pulls a list of command-line arguments from the environment. So when we run this program we need to supply the name of the file we want listed. To make getArgs available we have to import module System.

• The type signature is getArgs :: IO [String] which says "getArgs is a pluggable monad component which will return a list of

import System main :: IO () main = do [filename] <- getArgs contents <- readFile filename putStr contents -- Run as :main "src.txt"

main = do putStr "Hi, what is your name? " theName <- getLine putStr ("We hope you enjoy Haskell, " ++ theName ++ ".")

97

String. It doesn't need any inputs arguments.

• The <- binding operator works just like it does in a list comprehension. You can introduce a new name on the left (or use a pattern if you prefer), and the name is in scope for the rest of the statements in the do construct. We've used a pattern here that says we expect a singleton – exactly one – argument to main.

14.4.3 Copy a file from src to dst Of course, we'll want to learn to write new files! This time we need to run main with exactly two command-line arguments, the name of the source file, and the name of the destination file to which to copy it, so we use a pattern to match those...

• The prelude contains an alias type declaration type FilePath = String

• and the relevant type signature are readFile :: FilePath -> IO String writeFile:: FilePath -> String -> IO() IO () here denotes a monadic function that can propagate the IO state, but one that doesn't return anything of interest.

14.4.4 List all lines containing some word ...

import System main :: IO () main = do [word, src] <- getArgs contents <- readFile src let linesOfInterest = unlines (findEm word contents) putStrLn linesOfInterest findEm:: String -> String -> [String] findEm word cts = [ line | line <- lines cts, elem word (words line)] -- Run as :main "soccer" "src.txt"

import System main :: IO () main = do [src, dst] <- getArgs contents <- readFile src writeFile dst contents -- Run as :main "src.txt" "dst.txt"

98

Notables here:

• This demonstrates how main, with its monad plumbing, can easily call into pure conventional Haskell for the computation. There are four "stages" or statements in this pipeline, but the third has no interaction with the IO state at all.

• The prelude function lines breaks the content of a file into a list of lines. The function words splits a line into a list of words. The unlines function is the inverse of lines – it packs a list of strings into a single string, separated by newlines.

14.5 Further reading The Haskell site (http://www.haskell.org/) has some excellent tutorials. You can also find some more examples of the kinds of library functions for monads at http://members.chello.nl/hjgtuyl/tourdemonad.html: the whole story goes quite a bit deeper than the quite superficial treatment we've given in this introduction.

14.6 Key Points • Real programs need to manipulate state occasionally.

• Haskell has a mechanism to do so, based on the idea that we can firewall the pure and impure parts from each other.

• Some new syntax in Haskell, the do construct, lets us sequence stages of the computation and behave like imperative programmers once again!

• The IO monadic type gives us access to the state of the external world – environment variables, command-line arguments, the file system, and so on.

• The function main in a Haskell program is already of type IO( ), making it dead easy to get started.

• We've shown examples to read and write strings to the console, to collect command-line arguments, and to read and write files to the file system.

99

15 Circular Structures

15.1 Introduction Assume that a program produces a data structure, perhaps a list of items, but the generator program feeds the output back into its input, so that later items in the structure depend on items that have already been produced. Such an arrangement forms a loop which we call a circular, or self-referential data structure. Here the data structures are recursive, as opposed to all the examples in the earlier material where the functions are recursive but the data structures are non-cyclic.

This chapter shows some examples of the expressive power of these circular techniques. The development is heavily based on Henderson[36].

The first example generates a list of ones, by using a simple feedback.

If we think of ones as a function without parameters, this looks like a standard recursive function call. If we think of ones as a data structure, we can regard this as a structure that is defined circularly. Henderson[36] illustrates this function as a network of elements which pass streams of information to one another. The network diagram for this case can be shown as

The triangle on its side depicts the list constructor operation, where the two inputs are combined to produce a single (list-valued) result.

The next example shows a slightly more complex fragment for generating a circular list of integers.

let ints = 1 : map (1 +) ints in ... Once again we show Henderson's network diagram.

15.2 You’re Into a Time-Warp The combination of lazy evaluation and circular structures can interact in a counter-

-- The list [1,1,1,1,...] let ones = 1:ones in ...

100

intuitive way. This section solves a simple problem: find the minimum element in a list, and return a new list with that minimum added onto each of the original elements. The obvious solution requires two passes over the list: the first to find the minimum, and the second to add it to each of the original elements. The solution presented here does the job in a single pass. The trick is to use the minimum to construct the new list before the minimum has actually been determined!

The onepass function does two jobs, and returns its two results by tupling them into a new record. The first component of the tuple holds the minimum element found so far, while the second holds the new list. The lazy evaluation allows the expression a+minv to be constructed before minv is actually known. This is because the data constructors in Haskell are lazy, and do not evaluate their arguments before building the structure.

15.3 Prime Numbers Revisited A conventional method for finding prime numbers is to test whether the number is divisible by any of the previous primes that have already been found. Of course, one only has to look at primes up to the square root of the number (or until the square of the prime is more than the number, if floating point arithmetic is to be avoided). The list is ideally represented by a circular data structure. Treating 2 as a special case allows us to get the list started. The decision to examine only the odd numbers from 3 is a non-essential optimization.

15.4 Hamming Numbers A problem which often occurs in the literature is attributed to Hamming: it requires

-- The cycle is indirect here -- primes needs factors, which in turn depends on primes primes = 2:[n | n <- [3,5..], null (factors n)] where factors n = [i | i <- takeWhile (\p -> p*p <= n) primes, n `mod` i == 0]

-- One pass increments all values by the list minimum incmin:: [Int] -> [Int] incmin [] = [] incmin l = newlist where (minv,newlist) = onepass l onepass [a] = (a, [a+minv]) onepass (a:xs) = (min a b, (a+minv):bs) where (b,bs) = onepass xs incmin [3,1,4,1,5,9,2,6] à [4,2,5,2,6,10,3,7]

101

generating a strictly ascending sequence (i.e. no duplicates) of all numbers in such that

• 1 is in the sequence

• If x is in the sequence, so are 2×x, 3×x and 5×x.

• No other values are in the sequence.

Rather than generate all numbers and test them for the necessary properties, we note that the required sequence can be directly generated from the definition above. Note that if we already have the first few Hamming numbers 1, 2, 3, 4, 5, 6, 8, 9 ..., we can generate three new lists of Hamming numbers by multiplying these lists by 2, 3 and 5 respectively. The problem reduces to one of merging the three resulting lists in sequence, without duplicates. Merging three lists reduces to two merges of two lists each. (Note that this version of merge removes duplicates.)

A Henderson diagram again captures the cyclic nature of this computation.

15.5 Fibonacci Numbers We return once again to the Fibonacci numbers, and introduce a very elegant and efficient algorithm for their computation. Consider this constructive specification for the Fibonacci sequence:

x1=1

x2=1

xi=xi-1+xi-2 ∀ i>2

Let fi be the Fibonacci sequence starting from the i’th element. The following

merge3:: [Integer] -> [Integer] -> [Integer] -> [Integer] merge3 x y z = merge x (merge y z) where merge p@(a:x) q@(b:y) | a < b = a: merge x q | a > b = b: merge p y | otherwise = a: merge x y ham1:: [Integer] ham = 1 : merge3 [2*i | i <- ham] [3*i | i <- ham] [5*i | i <- ham]

102

relationships hold:

f1=x1,x2,x3,...=x1:f2

f2=x2,x3,x4,...=x2:f3

f3=x3,x4,x5,...= element-by-element sum of f1 and f2.

Henderson’s diagram for this network is given first:

Now the Haskell fragment which implements the necessary equations.

Some substitution leads to the following even more concise version:

15.6 Discussion Lazy evaluation has enabled us to view computation as a network of co-operating processes which produce and consume streams of data. The call-by-need formulation has freed us from the need to explicitly sequence the producers and the consumers. The infinite structures free us from having to program for boundary cases and termination. In this chapter we have taken this model one step further by allowing feedback loops in the computation. The results are elegant and the executable program is sometimes uncannily close to the original problem specification.

Besides Henderson’s treatment, Field and Harrison[27] also have an interesting treatment. Friedman and Wise[29] is an influential paper on lazy evaluation and stream based programming, while Allison[2] is a good contribution. Other references that address stream-based programming and circular structures are Abelson and Sussman[1], and Ida and Tanaka[45].

Not all cyclic definitions are well-founded. The computation can effectively deadlock itself if it waits for values that have not yet been produced. Sijtsma[58] studies

-- Finally, we can now write fibs as a one-liner! fibs = 1 : 1 : [a+b | (a,b) <- zip fibs (tail fibs)]

fibs:: [Int] fibs = f1 where f1 = 1 : f2 f2 = 1 : f3 f3 = [a+b | (a,b) <- zip f1 f2]

103

these cyclic data structures and clarifies the notions of when such a cyclic data structure is well defined. For example, he shows that the following definition is well-defined for any even number of tail operations, but it fails for an odd number.

let m = 1 : head (tail (tail (...(tail m)))) : m in m The next example from his paper is well defined provided there is at least one tail operation in the first expression.

let m = head (tail ...(tail m)) : 2 : tail m in m

15.7 Exercises 1. (Suggested by Herman Venter). The problem of generating Hamming numbers

has an extremely elegant solution if the intermediate data structure is a set rather than a list. In essence, the set mechanism automatically gets rid of duplicates. The only requirement is that there should be a way of finding the minimum element in the set. Sketch such a solution in the language of your choice.

15.8 Key Points • Circular data structures depend on laziness.

• A circular structure provides feedback, so that computation can depend on previous results.

• This style is well described by simple diagrams.

• It allows direct translation of some recurrence relations into Haskell.

• It is possible to use (as yet unknown) results in the function that is generating them.

• Not all such uses are well-founded. Caveat Emptor!

104

16 A Broader Picture, and Further Readings

16.1 The Style What is a functional language? We answer this question by a brief comparison with conventional languages. There are a number of ways in which functional languages are different: the form of program construction (what the code looks like), the operational models used to describe their execution (how the computer executes the code), and the semantic descriptions needed to reason about them (saying what a given bit of code actually means)!

We concern ourselves mainly with the first of these, and define functional programming as a style of programming in which the primary form of building programs is by applying functions to some arguments - function application.

Functions are combined to produce more powerful functions until finally we write a function which, given certain input, produces the required result. To give an example, we present a Haskell function which determines whether an integer number is prime or not:

The type declaration specifies that the function isPrime maps an integer argument to a boolean result. The definition specifies that n is prime if it has no factors. The (proper) factors of n are defined to be a list of numbers x, such that x is drawn from the range [2..n-1] and the value of (mod n x) is zero.

What are the main differences between this and a typical program in a conventional language?

Firstly, the functional program is more like a formal definition of the properties of a prime number. The emphasis is shifted more towards what constitutes a prime number (i.e. a specification), rather than how one goes about finding primes. Such programming styles are called declarative, as opposed to the more conventional imperative style, where the emphasis is on what steps should be taken. The imperative languages focus more attention on how to get to the result, by using assignments and loops to manipulate variables, and by exercising careful control of the order in which these steps must happen.

Secondly, we do not allow updating of variables. Names in a functional language stand for values, whereas they stand for locations in imperative languages.

isPrime:: Int -> Bool isPrime n = null (factors n) where factors n = [x | x <- [2..n-1], mod n x == 0]

105

Thirdly, there is no strong concept of the flow of control. Note that in our example each element in the range [2..n-1] is tested to see whether it is a factor of n, but the algorithm does not specify any explicit sequencing. An implementation with k processors could quite legitimately break the range [2..n-1] into k subranges and distribute the work.

16.2 The Reason Functional languages are awkward in some situations. In spite of this, there are compelling reasons for exploring new styles of specifying computation. A major computing goal is to control the escalating costs of producing and maintaining software. We still see disturbingly high bug-rates which seem particularly resistant to all forms of cure.

Part of the problem seems to be that conventional programming languages provide an optimized control language for programming conventional von Neumann machines – machines that consist of a single instruction pointer, a processor with some registers, a stack, and a large number of modifiable memory locations. After nearly fifty years of experience our languages have evolved into finely tuned systems for controlling these resources – we use variables to provide a neat syntax instead of using the absolute memory addresses, and use assignment statements to modify the memory locations. The random access arrays and the control structures in our conventional languages directly reflect the nature of the underlying machine with its linear memory and single instruction pointer – we regard a program as a sequence of steps, to be done in a fixed order, in the same way as the underlying hardware does its steps.

While we struggle to improve reliability and productivity in software production, we are simultaneously faced with new challenges to exploit hardware advances by effectively distributing computation. Many of these ideas are being driven by the Web, in which hundreds of thousands, of interconnected and cooperating processors, can all be working simultaneously on different parts of the same problem. Unfortunately, the very features that make the von Neumann machine so successful, i.e. the modifiable linear memories and a single flow of last-in-first-out procedure-based control, are the things that work against the move to parallelism.

Advocates of structured computer architecture might counter this argument, observing that if the layers of software, operating system, firmware and hardware are properly separated we ought to be able to replace the lower levels without any changes to our high-level view of the machine. Unfortunately, the von Neumann machine ideas are so deeply ingrained in our high-level programming languages that we may not be able to break from the single-processor hardware unless we also break from the von Neumann languages. Thus the search is for alternatives at the top end of the heirarchy – the language levels. Since functional languages offer some promise, they bear further investigation.

Significant progress towards higher productivity has been made in the short history of computing. It is interesting to note that almost all the major improvements in

106

productivity and reliability have occurred from restricting, rather than extending, the features available to the programmer. Concepts like object-oriented design, strong typing, modularity, public and private access, and information hiding are all based on the idea of introducing more formal controls into the languages, and only allowing things to be done in certain “approved” ways. Gone are those wild-west days when anything was acceptable, provided it worked!

Functional programming can be viewed as yet another step in that direction. But it is a large step, more radical than its predecessors. It may in fact have the potential to solve two problems at once: by restricting conventional languages even more, we can develop more powerful ways of reasoning about program construction, thereby improving reliability. These very restrictions also hint at a simultaneous solution to the multi-processing problem.

Changes in the past were not always accepted immediately. The cries of “What! Program without GOTO’s? Declare variables before using them? Explicitly convert between types? ” and so on will no doubt be echoed again, this time as “What! No variables? ”.

To illustrate the difficulty with conventional languages we examine some simple C statements:

Can the two function calls in the first statement be done in either order? Can they be executed simultaneously? Are the addition and multiplication operations commutative (by this we mean is A+B always the same as B+A)? Are statements (2) and (3) interchangeable, and will they always mean the same thing in a Pascal/C#/Java/Visual Basic program? In (4), can we simplify by using the algebraic rule G && G = G?

The answer, unfortunately, is no in all cases. Because side-effects are permitted, F might change M or N, and we can never be sure that both calls to F(M) will return the same value (perhaps if we peep at the internals of function F, we could be sure! )

So it is an easy task to construct cases for which the above transforms do not work as intended.

The problem lies in the fact that the value of an expression like f(7) does not only

-- A dirty C# implementation of F(x) -- with a side effect on k int k = 42; int F(int x) { k = k+17; return(x+k); }

K = F(M)*N + F(N)*M; (1) K = F(M) + F(M); (2) K = 2 * F(M); (3) if (G && G && G) ... (4)

107

depend on what is written in front of our eyes, but it may depend also on the global state (values) of some memory variables. Since this state is modifiable by any subprogram (behind our back, so to speak), we lose the ability to reason about and transform the program. Operations are no longer commutative, and the most elementary algebraic laws taught in junior school are no longer valid. Once the elementary laws are violated, there is no foundation on which one can develop viable systems for reasoning about programs.

It does not solve the problem to argue that a good programmer would not do these things anyway: the fact that a bad programmer is allowed to do them prevents us from constructing general-purpose tools that might be able to effectively check our code, or parallelize and optimize our programs.

The core of the problem in languages like Java and C# is that the order of subexpression evaluation is critical. Technically, the language is not referentially transparent. Referential transparency is the property which assures that an identifier stands for only one value, and that its value can literally be substituted for the identifier without any change to the meaning of the expression. Referential transparency is the crucial ingredient that affects our ability to reason about programs. It also determines whether subexpressions can be safely executed in parallel.

This difference is brought into even sharper contrast when we observe someone solving a simple problem with pencil and paper: partial result are calculated, scribbled onto the next fresh area on the paper, and used in further calculations. Solving a problem evidently involves manipulating values, and our concern with exactly where the intermediate results get scribbled seems inappropriate.

In functional programs the names stand for values, so that there is a direct correspondence between reasoning about the values in the problem, and reasoning within the framework of the language. By contrast, the conventional languages ask the programmer to compute by using locations which change their value as the computation proceeds. Much of the difficulty with conventional languages arises because we have to control the order of assignments to these re-useable locations, and must take this sequencing into account in order to reason about the computation.

There appear to be three ways to advance our software techniques, and to take better advantage of massively parallel hardware:

• Improve our compilers so that they can automatically detect the opportunities for parallelism. There is a persuasive counter-argument to this approach, which goes as follows: Conventional languages over-specify the solution because they impose sequencing on everything. For example, applying some well-behaved transform to every element in a vector gets over-specified in Java as

for (i=0; i < N; i++) { A[i] = transform(A[i]); } Note that the solution, not the problem, now specifies a definite order of computation. A vectorizing compiler can only rediscover the opportunities for parallelism that were already present in the original problem, but were lost due to over-specification. To retrospectively discover this parallelism now requires that the

108

compiler establish that the transform function doesn’t have side effects. In other words, serializing the operation in the first place loses some important information about the problem, and recovering that information may not be possible. Ultimately, a compiler that finds opportunities for parallelism is just getting rid of the over-specification demanded by our sequential languages.

Rather than introduce the sequentiality and then have the compiler attempt to remove it, it makes more sense not to over-constrain the solution in the first place.

• Throw the problem over the fence by asking the programmer to explicitly identify the potential parallelism with new syntactic mechanisms. This can hardly improve the current state of affairs: it introduces more complexity at the programmer’s end. This is unacceptably prone to errors, and likely to worsen the bug-rates in software. It is also unlikely that explicitly identified parallelism will scale up to take advantage of many processors, rather than a handful.

• Ban the troublesome features – i.e. sequencing and variables - from our languages. This is the thesis of some of the alternative programming paradigms: data flow languages, logic programming, functional programming, and so on.

Here then are our reasons for investigating functional languages:

• They are referentially transparent. They obey the fundamental laws of substitution, which makes their results independent of the order of computation. Specifically, parallelism is safe, and does not depend on a clever compiler or on a clever programmer to identify which operations can safely be done in parallel. (It may still need a clever compiler to identify efficient parallel decompositions, but at least the correctness is not at stake! )

• The referential transparency allows us to build an algebra of programs: programs are the objects of study, and transformations are the operations. We can establish general-purpose laws about programs which can help us reason about, write, modify, prove and optimize our programs.

• The functional style exhibits some interesting commonly occurring patterns of computation. Abstracting out and studying those patterns of computation as useful objects in their own right leads to further insights into the nature of computation. The list operators studied later in these notes follow this approach.

• Functional programs are usually an order of magnitude more concise than their imperative counterparts. Besides being shorter, they can be much more readable. Some studies claim that the number of bugs per line is more or less constant, independent of the level of the language in use. Higher-level languages encode more concept per line, and therefore have relatively fewer bugs.

• Functional programs are often more akin to formal specifications than their conventional counterparts. A good notation goes a long way towards solving the problem[47].

• Lazy evaluation permits a new approach to some algorithms. It is a simple but powerful idea that can remove the need for explicit backtracking, and can allow the programmer to manipulate infinite structures. It depends on implementing a specific

109

policy for choosing the order in which expressions are evaluated. Varying the order of computation is only safe within the context of a well-behaved language.

• Functional languages exploit the notion that a function is a first-class object which can be passed as a parameter, returned as the result of a computation, made part of a data structure, and so on. This first-class status of functions contributes greatly to the regularity and power of abstraction in the language.

• We may be able to find new ways to write programs. For one example of this, Bird[9] sets out to write a program for breaking up text into lines. To do this he constructs a simpler function which does the opposite of what he really wants – it combines lines into text. But because the two functions are inverses of one another, he can manipulate the simpler one via well-understood rules to find its inverse. The result is the program he set out to write!

• The study of how to execute functional languages (a topic which is beyond the scope of these notes) is an interesting area in its own right, and is contributing significantly to the design of parallel computer architectures[65].

16.3 The Warning This document is an enthusiastic and biased presentation of the functional style. It exploits those problems which fit nicely into the functional style, and conveniently ignores any which do not. It speaks of the possibility of complex transformational systems, richly developed algebras, and the potential for using thousands of processors. Although significant progress is occurring in these fields, this is not the current state of affairs!

The functional style has brought with it some of its own problems. Too much parallelism can swamp a system. Space and time performance of algorithms is sometimes counter-intuitive and tricky. There are some algorithms for which the known functional solutions are of a worse order than their imperative counterparts. (We have no proof that the functional formulations must be worse, so we are still trying to find equally efficient functional formulations).

On the issues of declarative and imperative languages, pure specifications cannot substitute for controlling the methods of achieving the results. As an example,

110

consider this specification of a sort program: Given a list of elements, search through all the permutations to find those that are in ascending order, and take the first one3 as your sorted list. In Haskell the program might read

sort xs = head [p | p <- permutations xs, ascending p] where permutations and ascending have been appropriately defined. Verifying that this program sorts correctly is easy, but it has appalling performance. It is not envisaged that purely declarative specifications will capture the essence of Quicksort or Shellsort. This quotation from Abelson and Sussman[1] gives their view of the subject:

“The computer revolution is a revolution in the way we think, and in the way we express what we think. The essence of this change is what might best be called “procedural epistemology” – the study of the structure of knowledge from an imperative point of view, as opposed to the more declarative point of view taken by classical mathematical subjects. Mathematics provides a framework for dealing precisely with notions of “what is”. Computation provides a framework for dealing precisely with notions of “how to”.”

The challenge is to somehow separate the issues of what from the issues of how, in such a way that we can exploit both. The specification should be the one and only definition of what the system must compute. It must be amenable to formal proofs and reasoning, and must be as concise and clear as possible. Additional annotations and transformations should allow the programmer to contribute the “how to” expertise, but in a constrained way that guarantees adherence to the original specifications.

16.4 The Reading Most texts on functional programming expound motivations that are similar to those discussed here. The emphasis differs considerably from one to another, however.

Backus[4] was responsible for evoking much interest in the topic. This classic paper, especially his reasons for the new paradigm, are essential reading. His variable-free FP notation is an awkward programming medium, but the lack of variables makes the notation very suitable for deriving transformations and laws about programs, and his approach is biased in this direction.

Michaelson[52] has a delightful first chapter oriented around a comparative look at imperative and functional styles. His book emphasises a solid theoretic background, and he puts functional programming into perspective by discussing its origins and the work in related areas.

Turner’s[62] exposition considers the software engineering aspects: the cost of producing software, and the challenges of parallelism.

Glaser et. al.[31] stress the declarative nature as the strong point for functional

3Will there always be at least one permutation in ascending order? When could there ever be more than one?

111

languages, and see the style as a natural way to encourage stepwise refinement and case-by-case analysis of problems.

Field and Harrison’s[27] work is highly recommended for further study. They use Hope as the programming vehicle, and the book contains a useful introduction to Hope, implementation techniques, transformation techniques, type checking, and semantic issues.

Hughes[41] argues that functional programming is vitally important to the real world because it promotes a degree of modularity that is just not possible in conventional languages. In Section 12.3 we show a related example that separates issues related to generating and testing a search space. He also shows that functional languages provide more powerful abstraction tools than we have available in conventional languages. In particular, he argues that modularity based on scope rules and modules is not enough: we need to be able to separate patterns of computation out of algorithms, and then glue these together for use in other situations. Section 11.2 of these notes looks at some of these issues.

Two popular books have been aimed at introductory courses for novice programmers. Neither of these concern themselves with implementation issues. It seems that a course based around texts like these could significantly cut the overheads associated with an introductory course, while still providing a solid practical and theoretical foundation for further study. Bird and Wadler[11] use a language very similar to these notes, while Wikstrom[74] bases his development on Standard ML.

Kelly[50] uses a language like Haskell, but he concentrates on the algebraic manipulation of functional programs to derive parallel versions for various architectures. The book is advanced, but highly recommended.

Eisenbach’s[25] book is a collection of individual contributions on various aspects of languages, implementations, architectures and formal properties. It is an ideal introduction for someone who wants an overview.

Burge[14] is somewhat dated, but was a precursor to much of the current activity.

Reade’s excellent book[57] ranges from introductory concepts to material suitable for an advanced course at the postgraduate level. The programming vehicle is Standard ML.

Henson[37] develops the functional programming concepts from the lambda calculus: the approach consistently looks first at the underlying formalisms, theorems and laws, and then shows how these can be realized or applied in the language.

Although their book is not about functional programming per se, Abelson and Sussman’s[1] text is an extremely challenging and exciting approach to Computer Science. It uses Scheme, a modern dialect of Lisp developed for teaching. They view the central computing problem as one of coping with complexity. They ground their students in a software-engineering approach to program development right from the first line of code that they write.

Henderson’s[36] book was probably the first of the revival generation. It is based around a purely functional subset of Lisp. Lisp and its dialects are not generally as

112

readable as the modern functional programming dialects. They hold a disadvantage for the serious functional programmer: the Lisp syntax discourages algebraic manipulation by substitution, and formulation of laws about programs. For example, even in a purely functional subset of Scheme or Lisp it is awkward to write down algebraic laws that express the notions that append is associative, or that a filtering operation distributes over append.

In the 1990's we saw the emergence of a “consensus” functional language, Haskell 1.4[39, 40, 53]. Developed by a team of contributors, the objective is a freely available, production-quality, standardized functional language. The Hugs implementation is the one used in this course. Visit http://www.haskell.org.

17 Bibliography [1] Abelson, H., Sussman, G. Structure and Interpretation of Computer Programs. The MIT Press, 1985.

[2] Allison, L. Circular Programs and Self-referential Structures. Software — Practice and Experience, 19(2), 1989, pp. 99–109.

[3] Augustsson, L. Compiling Pattern Matching. Functional Programming Languages and Computer Architecture, Lecture Notes in Computer Science 201, Springer-Verlag, 1985, pp. 368–381.

[4] Backus, J. Can Programming Be Liberated from the von Neumann Style? A Functional Style and Its Algebra of Programs. Comm. ACM 21(8), August 1978, pp. 613–641.

[5] Backus, J. The Algebra of Functional Programs: Function Level Reasoning, Linear Equations, and Extended Definitions. Formalization of Programming Concepts, Lecture Notes in Computer Science 107, Springer-Verlag, 1981.

[6] Bailes, P.A. G: A Functional Language with Generic Abstract Data Types. Comp. Lang. 12(2), 1987, pp. 69–84.

[7] Bath, J.M. Shifting Garbage Collection Overhead to Compile Time. Comm. ACM 20(7), July 1977, pp. 513–518.

[8] Bird, R.S. The Promotion and Accumulation Strategies in Transformational Programming. ACM Trans. Prog. Lang. and Sys. 6(4), October 1984, pp. 487–504. Addendum Ibid 7(3), July 1985, pp. 490–492.

[9] Bird, R.S. An Introduction to the Theory of Lists. Programming Research Group Technical Monograph PRG-56, Oxford University Computing Laboratory, 1986.

[10] Bird, R.S. Lectures on Constructive Functional Programming. Programming Research Group Technical Monograph PRG-69, Oxford University Computing

113

Laboratory, 1988.

[11] Bird, R.S., Wadler, P. Introduction to Functional Programming. Prentice-Hall, 1988.

[12] Bird, R.S. Algebraic Identities for Program Calculation. Comp. Journ. 32(2), April 1989, pp. 122–126.

[13] Bellot, P. Higer Order Programming in Extended FP. Functional Programming Languages and Computer Architecture, Lecture Notes in Computer Science 201, Springer-Verlag, 1985, pp. 65–80.

[14] Burge, W.H. Recursive Programming Techniques. Addison-Wesley, 1975.

[15] Burstall, R.M., Darlington, J. A Transformation System for Developing Recursive Programs. JACM 24(1), January 1977, pp. 44–67.

[16] Burton, F.W. Annotations to Control Parallelism and Reduction Order in Distributed Evaluation of Functional Programs.

[17] Burton, F.W., Huntbach, M.M. Virtual Tree Machines. IEEE Transactions on Computers C-33(3), March 1984, pp. 278–280.

[18] Burton, F.W. Functional Programming for Concurrent and Distributed Computing. Computer Journal 30(5), 1987, pp. 437–459.

[19] Bush, V.J., Gurd, J.R. Transforming Recursive Programs for Execution on Parallel Machines. Functional Programming Languages and Computer Architecture, Lecture Notes in Computer Science 201, Springer-Verlag, 1985, pp. 350–367.

[20] Clack, C., Peyton Jones, S. Strictness Analysis — a Practical Approach. Functional Programming Languages and Computer Architecture, Lecture Notes in Computer Science 201, Springer-Verlag, 1985, pp. 35–49.

[21] Clack, C., Peyton Jones S. The Four-Stroke Reduction Engine. Proceedings of the 1986 ACM Conference on Lisp and Functional Programming, August 1986, pp. 220–232.

[22] Coxhead, P. Starting Lisp for AI. Blackwell, ? ? ? ? .

[23] Darlington, J. An Experimental Program Transformation and Synthesis System. Artificial Intelligence 16, 1981, pp. 1–46.

[24] Davie, A.J.T. An Introduction to Functional Programming Systems using Haskell. Cambridge Computer Science Texts, 1992.

[25] Eisenbach, S. Functional Programming: Languages, Tools and Architectures. Ellis Horwood, 1987.

[26] Fairbairn, J. Making Form Follow Function: An Exercise in Functional programming Style. Software — Practice and Experience 17(6), June 1987, pp. 379–387.

[27] Field, G. and Harrison, P. Functional Programming. Addison-Wesley, 1988.

[28] Fleck, A.C. Structuring FP-Style Functional Programs. Comp. Lang. 11(2), 1986,

114

pp. 55–63.

[29] Friedman, D.P., Wise, D.S. Unbounded Computational Structures. Software — Practice and Experience, 8(4), April 1978, pp. 407–416.

[30] Friedman, D.P., Wise, D.S. Functional Combination. Comp. Lang. 3, 1978, pp. 31–35.

[31] Glaser, H., Hankin, C., Till, D. Principles of Functional Programming. Prentice-Hall, 1984.

[32] Hankin, C.L., Osmon, P.E., Shute, M.J. COBWEB — A Combinator Reduction Architecture. Functional Programming Languages and Computer Architecture, Lecture Notes in Computer Science 201, Springer-Verlag, 1985, pp. 99–112.

[33] Hankin, C.L., Burn, G.L., Peyton Jones, S.L. A Safe Approach to Parallel Combinator Reduction.

[34] Hanson, D.R. Code Improvement via Lazy Evaluation. IFIP Letters 11(4,5), Dec. 1980, pp. 163–167.

[35] Henderson, P., Morris, J.H. A Lazy Evaluator. 3rd ACM Symp. on Principles of Programming Languages, Jan. 1976, pp. 95–103.

[36] Henderson, P. Functional Programming, Application and Implementation. Prentice-Hall, 1980.

[37] Henson, M.C. Elements of Functional Languages. Blackwell, 1987.

[38] Hoffman, C.M., O’Donnell, M.J. Programming with Equations. ACM Trans. Prog. Lang. and Sys. 4(1), January 1982, pp. 83–112.

[39] Hudak, P. Conception, Evolution, and Application of Functional Programming Languages. ACM Comp. Surv. 21(3), Sept. 1989, pp. 359–411.

[40] Hudak, Peyton-Jones, et. al. Report on the Programming Language Haskell, Version 1.2. Sigplan Notices 27(5), May 1992.

[41] Hughes, J. Why Functional Programming Matters. Comp. Journ. 32(2), April 1989, pp. 98–107.

[42] Hughes, J. The Design and Implementation of Programming Languages. Ph.D. Thesis, Oxford University, 1984.

[43] Hughes, J. Lazy Memo Functions. Functional Programming Languages and Computer Architecture, Lecture Notes in Computer Science 201, Springer-Verlag, 1985, pp. 129–146.

[44] Ida, T. Some FP Algebra with Currying Operation. IFIP Letters 17, 1983, pp. 259–261.

[45] Ida, T., Tanaka, J. Functional Programming with Streams. Information Processing 83, R.E.A. Manson (ed.), 1983, Elsevier, pp. 265–270.

[46] Iverson, K.E. Operators. ACM Trans. Prog. Lang. and Sys. 1(2), October 1979, pp. 161–176.

115

[47] Iverson, K.E. Notation as a Tool of Thought. Comm. ACM 23(8), August 1980, pp. 444–465.

[48] Johnsson, T. Efficient Compilation of Lazy Evaluation. SIGPLAN Notices 19(6), June 1984, pp. 58–69.

[49] Johnsson, T. Lambda Lifting: Transforming Programs to Recursive Equations. Functional Programming Languages and Computer Architecture, Lecture Notes in Computer Science 201, Springer-Verlag, 1985, pp. 190–203.

[50] Kelly, P. Functional Programming for Loosely-coupled Multiprocessors. Pitman, 1989.

[51] MacLennan, B.J. Functional Programming, Practice and Theory. Addison-Wesley, 1990.

[52] Michaelson, G. An Introduction to Functional Programming through Lambda Calculus. Addison-Wesley, 1988.

[53] Peterson, J. et. al. Haskell 1.4, A Non-strict, Purely Functional Language. http://haskell.org/onlinereport/ [54] Peyton Jones, S.L. The Implementation of Functional Programming Languages. Prentice-Hall, 1986.

[55] Peyton Jones, S.L. Parallel Implementations of Functional Programming Languages. Comp. Journ. 32(2), April 1989, pp. 175–186.

[56] Plasmeijer, R., van Eekelen, M. Functional Programming and Parallel Graph Rewriting, Addison-Wesley, 1993.

[57] Reade, C. Elements of Functional Programming. Addison Wesley, 1989.

[58] Sijtsma, B. On the Productivity of Recursive List Definitions. ACM TOPLAS 11(4) Oct. 89, 633–649.

[59] Takeichi, M. Fully Lazy Evaluation of Functional Programs. Doctoral Thesis, Educational Computer Center, University of Tokyo, March 1987.

[60] Trinder, P. A Functional Database. PhD Thesis, Report CSC 90/R10, University of Glasgow, April 1990.

[61] Turner, D.A. A New Implementation Technique for Applicative Languages. Software — Practice and Experience, 9, 1979, pp. 31–49.

[62] Turner, D.A. Recursion Equations as a Programming Language. Functional Programming and its Applications, Eds. Darlington, J., Henderson, P., Turner, D.A. Cambridge University Press, 1982, pp. 1–28.

[63] Turner, D.A. Miranda: A Non-Strict Functional Language with Polymorphic Types. Functional Programming Languages and Computer Architecture, Lecture Notes in Computer Science 201, Springer-Verlag, 1985, pp. 1–16.

[64] Turner, D.A. Functional Programs as Executable Specifications. Mathematical Logic and Programming Languages, ed. Hoare, C.A.R., Shepardson, J., Prentice-Hall,

116

1985, pp. 29–54.

[65] Vegdahl, S.R. A Survey of Proposed Architectures for Execution of Functional Languages. IEEE Transactions on Computers, C-33(12), December 1984, pp. 1050–1071.

[66] Wadler, P.L. Applicative Style Programming, Program Transformation, and List Operators. Conf. on Functional Programming Languages and Computer Architecture, ACM, October 1981, pp. 25–32.

[67] Wadler, P.L. Listlessness is Better than Laziness. PhD Dissertation, Carnegie-Mellon University, 1985.

[68] Wadler, P.L. Listlessness is Better than Laziness: Lazy Evaluation and Garbage Collection at Compile Time. Conf. Record of the 1984 ACM Symposium on Lisp and Functional Programming, 1984, pp. 45–52.

[69] Wadler, P. Fixing Some Space Leaks with a Garbage Collector. Software — Practice and Experience, 17(9), September 1987, pp. 595–608.

[70] Wentworth, E.P. The Rhodes FLFM Machine. Technical Document PPG 89/7, Department of Computer Science, Rhodes University, May 1989.

[71] Wentworth, E.P. The Case for Functional Programming. Proceedings of V’th South African Computer Conference, 1989.

[72] Wentworth, E.P. Pitfalls of Conservative Garbage Collection. Software — Practice and Experience, 20(7), July 1990, pp. 719–727.

[73] Wentworth, E.P. The RUFL Reference Manual. Technical Document PPG 92/1, Department of Computer Science, Rhodes University, Jan. 1992.

[74] Wikstrom, A. Functional Programming Using Standard ML. Prentice-Hall, 1987.

[75] Witten, I.H., Neal, R.M., Cleary, J.G. Arithmetic Coding for Data Compression., Comm. ACM 30(6), 1987, pp. 520–540.

[76] Wray, S.C., Fairbairn, J. Non-Strict Languages — Programming and Implementation. Comp. Journ. 32(2), April 1989, pp. 175–186.

[77] Wright, D.A., Dekker, A.H. Strictness Analysis and Type Inference in the λ-Calculus (Summary). Technical Report RR 88-9, Dept. of Elect. Eng. and Comp. Sci., Univesity of Tasmania, September 1988.

117

18 Some Useful Haskell Functions (++):: [a] -> [a] -> [a] Append lists xs and ys. xs ++ ys

(!!):: [a] -> Int -> a Return the n'th element of xs, numbered from 0.

xs !! n

(.)::(a->b) -> (c->a) -> (c->b) Functional composition of functions f and g.

f . g

(&&):: Bool -> Bool -> Bool Logical AND of u and v. u && v

(||):: Bool -> Bool -> Bool Logical OR of u and v. u || v

(**)::Floating a => a -> a -> a Compute x^y, where y is non-integral. x ** y

^:: (Num a, Integral b) => a -> b -> a Compute x^n, where n is a non-

negative exponent. x^n

^^::(Fractional a,Integral b)=> a -> b -> a Compute x^n where n may be negative

or positive. x ^^ n

(/=):: Eq a => a -> a -> Bool Return True if x is not equal to y. x /= y

(\\):: Eq a => [a] -> [a] -> [a] (called bagdiff )

Return the elements in xs that are not eliminated by matching elements in ys. In module List.

xs \\ ys

(==):: Eq a => a -> a -> Bool Return True if x is equal to y. x == y

(>=):: Ord a => a -> a -> Bool Return True if x is greater or equal to y.

x >= y

(<=):: Ord a => a -> a -> Bool Return True if x is less or equal to y. x <= y

(<):: Ord a => a -> a -> Bool Return True if x is less than y. x < y

(>):: Ord a => a -> a -> Bool Return True if x is greater than y. x > y

abs:: Num a => a -> a Returns the absolute value of x. abs x

all:: (a->Bool) -> [a] -> Bool Returns True if (pred x) is True for every element x of xs.

all pred xs

and:: [Bool] -> Bool AND together all the elements of xs. and xs

118

any:: (a->Bool) -> [a] -> Bool Returns True if (pred x) is True for any element x of xs.

any pred xs

chr:: Int -> Char Returns the character corresponding to ASCII value k. You will need import Char to load the relevant library module.

chr k

compare:: Ord a => a -> a -> Ordering Compare two values of the same type,

and return an Ordering value.

concat:: [[a]] -> [a] Returns the result of appending all the lists in xxs together.

concat xxs

const:: a -> b -> a Returns k, ignores x. const k x

cos:: Floating a => a -> a Calculate (cos x), where x is in radians.

cos x

cycle:: [a] -> [a] Creates a new (infinite) list as a repeating cycle of the elements in xs.

cycle xs

delete::(Eq a) => a->[a]->[a] Deletes first x from xs, if it exists. In module List

delete 'a' "sam"

div:: Integral a => a -> a -> a Returns n divided by k, using integral division.

n `div` k

divMod::Integral a=>a->a->(a,a) Returns the pair (div n k, mod n k)

n `divMod` k

drop:: Int -> [a] -> [a] Drops the first n elements from xs (or returns [] if xs is too short).

drop n xs

dropWhile:: (a->Bool)->[a]->[a] Drops elements from xs until it encounters the first x from xs for which (p x) is False.

dropWhile p xs

elem:: Eq a => a -> [a] -> Bool Returns True if e is an element of xs. e `elem` xs

even:: Integral a => a -> Bool Returns True if n is an even number. even n

exp:: Floating a => a -> a Returns e^x. exp x

filter::(a->Bool) -> [a] -> [a] Returns a list of all x in xs for which (pred x) is True.

filter pred xs

119

flip:: (a->b->c) -> (b->a->c) Returns a modified version of op which expects its arguments in the opposite order.

flip op

foldl::(a->b->a)->a->[b]->a foldl reduces xs from the left, by applying op to each successive result and the next element of xs. iv provides the initial value for the accumulating parameter, i.e. the left argument of the very first application of op.

foldl op iv xs

foldl1:: (a->a->a) -> [a] -> a Like foldl, except it uses the first x in xs as its initial value. xs must be non-empty.

foldl1 op xs

foldr::(a->b->b)->b->[a]-> b foldr reduces xs from the right, by applying op to each list element and the result of folding the tail. iv provides the initial value for the right argument of the very rightmost application of op.

foldr op iv xs

foldr1:: (a->a->a) -> [a] -> a Like foldr, except it uses the last x in xs as its initial value. xs must be non-empty.

foldr1

fromInteger::Num a =>Integer->a Convert n into a Double, Float, Int, etc. See toInteger.

fromInteger n

fromInteger::Num a=>Integer->a Convert n into a Double, Float, Integer, etc. See toIntInteger.

fromInteger n

fst:: (a,b) -> a Returns the first component of a pair. fst (u,v)

getArgs:: IO [String] Returns a list of command-line args. In System module.

as <- getArgs

gcd:: Integral a => a -> a -> a Returns the greatest common divisor of k and n.

gcd k n

groupBy:: (a->a->Bool) -> [a] -> [[a]] Groups adjacent similar elements,

using the predicate to decide similarity to the first in the group..

groupBy (==) xs

120

head:: [a] -> a Returns the first element of non-empty list xs.

head xs

id:: a -> a Identity function, returns x. id x

init:: [a] -> [a] Drops the rightmost element of non-empty list xs.

init xs

intToDigit:: Int -> Char Converts an integer in the range [0..15] into one of the characters '0', '1' .. 'f'. You will need import Char to load the relevant library module.

intToDigit n

iterate:: (a->a) -> a -> [a] Returns the infinite list [x, f x, f (f x), f (f (f x))...]

iterate f x

last:: [a] -> a Returns the last element of non-empty list xs.

last xs

lcm::Integral a => a -> a -> a Returns the lowest common multiple of n and k.

lcm n k

length:: [a] -> Int Returns the number of elements in xs. length xs

lines:: String -> [String] Breaks a string into separate lines wherever a newline character is found.

lines s

log:: Floating a => a -> a Computes the natural logarithm of x. log x

logBase::Floating a => a -> a Computes the base b logarithm of x. logBase b x

map:: (a->b) -> [a] -> [b] Each element of the result list is obtained by applying fun to the corresponding element in xs.

map fun xs

max:: Ord a => a -> a -> a Returns the larger of x and y. x `max` y

maximum:: Ord a => [a] -> a Returns the largest item in non-empty list xs.

maximum xs

min:: Ord a => a -> a -> a Returns the smaller of x and y. x `min` y

minimum:: Ord a => [a] -> a Returns the smallest item in non-empty list xs.

mininum xs

121

mod::Integral a => a -> a -> a Returns remainder of n divided by k, using integral division.

n `mod` k

negate:: Num a => a -> a Returns -x. negate x

not:: Bool -> Bool Returns the Boolean complement of b. not b

notElem::Eq a => a->[a]->Bool Returns True if e is not an element of xs.

e `notElem` xs

null:: [a] -> Bool Returns True if xs is the empty list. null xs

nub:: Eq a => [a] -> [a] Removes duplicates from a list. In particular, keeps only the first occurrence of each element.

nub [2,3,2,5,1,5]

odd:: Integral a => a -> Bool Returns True if n is an odd number. odd n

or:: [Bool] -> Bool OR together all the elements in xs. or xs

ord:: Char -> Int Returns the ASCII representation of c. You will need import Char to load the relevant library module.

ord c

pi:: Floating a => a 3.1415926... pi

product:: Num a => [a] -> a Multiply together all the elements of list xs.

product xs

putStr:: String -> IO () Output a string to stdout. putStr "hello"

putStrLn:: String -> IO () Output a string and newline to stdout putStrLn "hello"

readFile:: FilePath -> IO String Read the file contents into a String s<- readFile path

recip:: Fractional a => a -> a Return 1 / x. recip x

repeat:: a -> [a] Return an infinite list [e,e,e,e,e,...]

repeat e

replicate:: Int -> a -> [a] Return a list containing n occurrences of e.

replicate n e

reverse:: [a] -> [a] Return a list containing the elements of xs in reversed order.

reverse xs

122

scanl:: (a->b->a)->a->[b]->[a] scanl op iv xs

scanl1:: (a->a->a)->[a]->[a] Like scanl, except that the first element of non-empty xs is used as the initial value iv.

scanl1 op xs

scanr::(a->b->b) -> b->[a]->[b] scanr op iv xs

scanr1::(a->a->a) -> [a] -> [a] Like scanr, except that the last element of non-empty xs is used as the initial value iv.

scanr1 op xs

show::Show a => a -> string Returns a string representation of x. show x

sin::Floating a => a -> a Calculate (sin x), where x is in radians.

sin x

snd:: (a,b) -> b Returns the second component of a pair.

snd (u,v)

sort:: (Ord a) => [a] -> [a] Returns a sorted version of the list. In module List

sort "hello"

sortBy::(a -> a -> Ordering)-> [a]-> [a]

Returns a sorted version of the list, but uses the first argument to order elements. Use if you want ordering to be based on one field of a pair, for example. In module List

sortBy (\ u v -> compare (snd u)

(snd v)) es

span::(a->Bool)->[a]->([a],[a]) Break xs into two pieces, (takeWhile p xs, dropWhile p xs).

span p xs

splitAt::Int->[a]->([a],[a]) Break xs after the n'th element into two pieces. Equivalent to (take n xs, drop n xs)

splitAt n xs

sqrt::Floating a => a -> a Calculate the square root of x. sqrt x

sum:: Num a => [a] -> a Sum together all the elements of list xs.

sum xs

tail:: [a] -> [a] Return the tail of a non-empty list xs. tail xs

123

take:: Int -> [a] -> [a] Returns the list comprising the first n elements of xs, or as many as it can if xs is too short.

take n xs

takeWhile::(a->Bool)->[a]->[a] Returns the list comprising the first elements x of xs for which (pred x) is True.

takeWhile pred xs

tan:: Floating a => a -> a Calculate (tan x), where x is in radians.

tan x

toInteger::Integral a => a->Integer Converts one of the Integral types to

an Integer. Compare fromInteger. toInteger x

truncate:: (RealFrac a, Integral b) => b -> b

Returns the integral part of a real number.

truncate x

unlines:: [String] -> String Combines a list of lines into a single string by concatenating them together with terminating newline characters after each line.

unlines ls

until:: (a->Bool)->(a->a)->a->a Iterate function f on initial value x until the result satisfies the predicate. If the initial x already satisfies the predicate, no iterations will occur.

until pred f x

unwords:: [String] -> String Combines a list of words into a single string by concatenating them together with separating space characters between each word.

unwords ws

unzip:: [(a,b)] -> ([a],[b]) Separate a list of pairs into a pair of lists.

unzip ps

unzip3::[(a,b,c)] -> ([a],[b],[c]) Separate a list of triples into a triple

of lists. unzip3 ts

words:: String -> [String] Breaks str into separate words wherever any whitespace (space or newline character) is found.

words str

writeFile:: FilePath -> String -> IO() Write a string to the named file. writeFile path s

124

zip:: [a] -> [b] -> [(a,b)] Zips two lists, element by element, into a list of pairs. The length of the result will be the length of the shorter of xs and ys.

zip xs ys

zip3::[a]->[b]->[c]->[(a,b,c)] Zips three lists, element by element, into a list of triples. The length of the result will be the length of the shortest of the three input lists.

zip3 xs ys zs

zipWith:: (a->b->c)->[a]->[b]-> [c] Applies binary function op to

corresponding elements of xs and ys, and produces a list of results.

zipWith (+) xs ys

zipWith3:: (a->b->c->d)-> [a]->[b]->[c]->[d]

Applies 3-argument function f to corresponding elements of xs, ys and zs, and produces a list of results.

zipWith3 f xs ys zs

125

19 Index :, 10

++, 26

accumulating parameter, 30, 33, 45

append, 26

arity

constructor, 51

type, 52

ART puzzle, 93

backtracking, 87

Backus’ FP, 111

bag difference, 45

bottom, 85

c, 19

call by need, 85

Church-Rosser theorem, 85

circular structures, 100

class, 21

closure, 90

co-domain, 70

combinator, 73

const, 73

flip, 73

constructor, 10, 48

context, 21

coroutine, 88, 103

cost function, 26

Coxhead, 92

data constant, 51

data constructor, 10, 51

data declarations, 48, 60

field names, 61

debugging, 91

declaration, 8

definition, 8

local, 31

domain, 70

drop, 26

eight queens problem, 87

Eq, 22

evaluation

eager, 85

lazy, 85

order of, 85, 108

extensionality, principle of, 72, 79

Fibonacci numbers, 33, 102

filter, 41

fixed-point, 90

flatten, 52

foldr, 77

fst, 36

function

curried, 71, 72

currying, 10

first-order, 70

higher-order, 70

partial, 15, 70

partially applied, 71, 72

total, 70

function composition, 73

functionals, 70

126

generators, 41

getArgs, 94

Goldbach’s conjecture, 46

greatest common divisor, 32

groupBy, 62

guard, 14

Hamming numbers, 101

head, 10, 14

height-balanced tree, 58

homomorphism, 81

identifier, bound and free, 70

infinite objects, 86

infix functions, 8

Integral, 23

IO, 96

lambda expressions, 60

lambda lifting, 88

let, 31

list comprehension, 42

list operators, 41, 77

listless style, 82

map, 41

memo function, 33

minimum, 89

modularity, 88

Monad, 95

name equivalence, 36

Num, 23

operator

associativity, 71

infix, 26, 70

precedence, 71

optimization, 63

Ord, 22

order of algorithm, 26

pair, 36

parallelism, 89, 108

pattern

alias, 15

decomposition, 32

irrefutable, 15

literal, 14

matching, 13

order of testing, 15

underscore, 15

permutations, 45

polymorphism, 19

predicate, 41

prefix functions, 8

prelude, 9

prime numbers, 86, 105

producer/consumer model, 88

putStr, 97

putStrLn, 97

Quicksort, 44

readFile, 94

recurrence relation, 33

recursion

backward, 29

forward, 30, 33

linear, 30

tail, 30, 33

reduction

outermost, 86

127

referential transparency, 63, 108

scope rules, 43, 88

section, 72

Show, 23

side effect, 107

Sieve of Eratosthenes, 86

singleton law, 67

snd, 36

sortBy, 62

sorting, 89

space leak, 91

stack frame, 30

strict argument, 90

strictness analysis, 90

String, 11, 36

structural equivalence, 36

structural induction, 64

style, imperative/declarative, 105

tail, 10, 14

take, 26

tree, AVL balanced, 58

tree, bounded-balanced, 59

tree, weight-balanced, 59

tuple, 36

type class, 21

type signature, 9

type synonym, 11, 36

type variable, 19

von Neumann machine, 106

where, 31

writeFile, 94

yield, 89

ZF expressions, 42

an introduction to functional programming using...

Documents