Notes week of 29 Oct

Lambda notation

“Lambda” is a notation for an anonymous function. The name comes from the Greek lowercase letter λ, and the symbol we use for it in Haskell is the backslash: \.

Normally when you define a function, you put its name and then its arguments, and then equals, and body.

And then you can use that function in various ways:

ghci> map square [3..9]
ghci> square 15

Instead, of naming that simple function separately, you can specify it “inline”, without any name, by using the syntax \x -> x*x. The argument(s) are named between the backslash and the arrow, and the body follows the arrow. So here are the previous examples but using a lambda instead:

ghci> map (\x -> x*x) [3..9]
ghci> (\x -> x*x) 15

In each use, the parentheses around the lambda expression are required. They tell Haskell how far the body of the function extends. If you instead tried to type

ghci> \x -> x*x 15

it would generate an error, because it tries to include the 15 into the body of the function, so the body is an expression like this:

x * (x 15)

Which means we’re trying to use x both as a number (in the multiplication) and as a function (in applying it to the argument 15). That conflict is the source of the error message we see.

You can still assign a lambda to a name if you want to, like this:

and that’s equivalent to the usual definition of square previously. We just moved the x argument to the right side of the equals by putting it behind a lambda.

Arguments can migrate in lots of ways – all these definitions would be equivalent (assuming the missing bodies are the same).

Pseudo-random numbers

We want random numbers, but a normal random() function like in Python would not be pure because it returns different results each time it is called with the same arguments.

 Python 3.6.6 (default, Jun 27 2018, 05:47:41)
 >>> from random import random
 >>> random()
 >>> random()
 >>> random()
 >>> random()

The strategy for making a pure version of random is to expose the “state” of the random number generator. Here is a definition of an LCG – Linear Congruential (pseudo-random) Generator.

The constants in here are chosen carefully to have particular effects. The multiplier 16807 ensures that the results are fairly evenly distributed, and the modulus 0x7FFFFFFF clamps the range to the range of 31-bit integers (same as non-negative 32-bit integers).

But this is just a formula, and if the seed is known then the results are predictable. Here’s how to get a sequence of random numbers:

ghci> rand (Seed 294992)
(662963250,Seed {unSeed = 662963250})
ghci> rand (Seed 662963250)
(1278182114,Seed {unSeed = 1278182114})
ghci> rand (Seed 1278182114)
(1127869057,Seed {unSeed = 1127869057})

The initial seed value, 294992, was chosen arbitrarily by bashing on the number row of my keyboard. From there, we take the seed returned by the previous call to rand and use it in the next call. This generates the sequence of three pseudo-random 31-bit numbers, which here are 662963250, then 1278182114, then 1127869057.

But whenever you give the same seed, you get the same number. Therefore the function is still pure.

ghci> rand (Seed 662963250)
(1278182114,Seed {unSeed = 1278182114})

The type of rand, Seed -> (Int, Seed) is a common pattern that we could call a generator. A more generic form of it could be represented as

It represents a function that takes an incoming state of type s, produces a result of type a, along with an outgoing state, also of type s. A common pattern when using a generator of this shape is to thread the state through multiple calls to the generator.

For example, suppose we want to write one function to generate three random numbers. That means calling rand three times and assembling the results into a list or tuple:

Notice how this takes an s0, threads it into one rand which produces s1. Then it takes s1 and threads it into the next rand, and so on. If any parameter in that sequence is out of place (if s1 is duplicated somewhere, for example), then we’ll get a duplication of numbers. Each seed should be used only once.

Here’s a slightly more sophisticated version that can generate a list of random numbers of a specified size:

It still exhibits the threading of s0 to s1 to s2.

Generalizing generators

Let’s play with a different function with the same shape, s -> (a,s). Instead of numbers, this will represent a counter that can be incremented.

ghci> increment (Counter 5)
(5,Counter 6)
ghci> increment (Counter 6)
(6,Counter 7)

Here is the analogue of threeRandoms, it has exactly the same state-passing structure.

ghci> threeIncrements (Counter 5)
([5,6,7],Counter 8)
ghci> threeIncrements (Counter 10)
([10,11,12],Counter 13)

This may seem kind of pointless, but it allows us to explore an interesting pattern.

Most operations we explored on trees – flipTree, fmap, height – are compositional. That means that the recursion down the left and right side are independent of each other.

But with some operations and transformations, the left and right are not independent. They may require threading state through each branch in the traversal. (After processing the left subtree we need to thread the state into the right subtree.)

One such operation is value-numbering the tree. That means using a counter as the state, and pairing branch values with unique, consecutive integers. Here’s the tree data type:

Some example trees:

And here is where we use the counter increment. Notice the same s0, s1, s2, s3 pattern.

ghci> numberTree t1 (Counter 1)
(Branch (1,'B')
  (Branch (2,'F') Leaf Leaf)
  (Branch (3,'C')
    (Branch (4,'D') Leaf Leaf)
    (Branch (5,'E') Leaf Leaf)),
 Counter 6)

ghci> numberTree t1 (Counter 10)
(Branch (10,'B')
  (Branch (11,'F') Leaf Leaf)
  (Branch (12,'C')
    (Branch (13,'D') Leaf Leaf)
    (Branch (14,'E') Leaf Leaf)),
 Counter 15)

ghci> numberTree t2 (Counter 1)
(Branch (1,'B')
  (Branch (2,'F')
    (Branch (3,'K') Leaf Leaf)
    (Branch (4,'Z')
      (Branch (5,'G') Leaf Leaf)))
  (Branch (6,'C')
    (Branch (7,'D') Leaf Leaf)
    (Branch (8,'E') Leaf Leaf)),
 Counter 9)