Notes week of 19 Nov

Compelling example of a monad: Parsing

Parsing means taking a stream of characters or “tokens” and interpreting them as objects in some language.

"-3.115" as a string can be interpreted to -3.115 as a Float.

"a+3*b" as an arithmetic tree.

Parse English sentences (to some extent).

"Jack went to the store." -->
     PP (Prep "to" (NP (ARTICLE "the", OBJECT "store")))

Pluck bits of data out of some text (scraping).

Why this type? The first String is the input – the stream of characters we are to parse.

The function returns a LIST because there can be multiple valid ways to parse the same string. Example: "a+b+c" could be grouped either to the left or the right: (ADD (ADD a b) c) or (ADD a (ADD b c)).

Real ambiguity from a real programming language: C/C++:

   if() __ else __

   if() { if() __ } else __     // else attaches to outer if
   if() { if() __ else __ }     // else attaches to inner if

But in C/C++ the curly braces are optional, so without them this is ambiguous.

The result is a PAIR of (a, String). The ‘a’ component represents the value that was parsed (of type a). The String represents any LEFTOVER characters that weren’t used yet.

Example: parsing an integer

    " -410xyz" --> [(-410, "xyz")]

A COMPLETE parse is when there are no characters leftover. But incomplete parses are used while parsing is in progress.

   "32+51" --> [(32, "+51")]   is the first step
eventually it gets to
    --> [(BINOP(+,32,51), "")] as the complete parse.

Match a particular character. So char ‘A’ it should match an ‘A’ on the input stream, but it doesn’t match any other character.

twoChars (satisfies isAlpha) (satisfies isDigit) "A3"
  --> [('A', "3")]      -- After running first parser, isAlpha
  --> [("A3", "")]      -- Apply second parser to leftover string(s)

These parser definitions can become much simpler if we recognize that Parser is a monad.

(All monads are also “Applicative functors” and “functors”, so declare those too.)

Alternative allows a choice – empty is failure and (<|>) is like “or else”.

Building a parser out of these components.

This parseInt works for sequences of digits, optionally followed by a negative sign:

ghci> runParser parseInt "13"
ghci> runParser parseInt "5221x"
ghci> runParser parseInt "-5221x"
ghci> runParser parseInt "- 5221x"
ghci> runParser parseInt "   -19"