Compelling example of a monad: Parsing

Parsing means taking a stream of characters or “tokens” and interpreting them as objects in some language.

`"-3.115"`

as a string can be interpreted to -3.115 as a Float.

`"a+3*b"`

as an arithmetic tree.

Parse English sentences (to some extent).

```
"Jack went to the store." -->
SENTENCE(SUBJECT "Jack", VERB "went",
PP (Prep "to" (NP (ARTICLE "the", OBJECT "store")))
```

Pluck bits of data out of some text (scraping).

Why this type? The first String is the input – the stream of characters we are to parse.

The function returns a LIST because there can be multiple valid ways to parse the same string. Example: `"a+b+c"`

could be grouped either to the left or the right: `(ADD (ADD a b) c)`

or `(ADD a (ADD b c))`

.

Real ambiguity from a real programming language: C/C++:

```
if() __ else __
if() { if() __ } else __ // else attaches to outer if
if() { if() __ else __ } // else attaches to inner if
```

But in C/C++ the curly braces are optional, so without them this is ambiguous.

The result is a PAIR of (a, String). The ‘a’ component represents the value that was parsed (of type a). The String represents any LEFTOVER characters that weren’t used yet.

Example: parsing an integer

` " -410xyz" --> [(-410, "xyz")]`

A COMPLETE parse is when there are no characters leftover. But incomplete parses are used while parsing is in progress.

```
"32+51" --> [(32, "+51")] is the first step
eventually it gets to
--> [(BINOP(+,32,51), "")] as the complete parse.
```

Match a particular character. So char ‘A’ it should match an ‘A’ on the input stream, but it doesn’t match any other character.

```
satisfies :: (Char -> Bool) -> Parser Char
satisfies ok = Parser f
where f [] = []
f (d:ds)
| ok d = [(d,ds)]
| otherwise = []
```

```
twoChars (satisfies isAlpha) (satisfies isDigit) "A3"
--> [('A', "3")] -- After running first parser, isAlpha
--> [("A3", "")] -- Apply second parser to leftover string(s)
```

```
twoChars :: Parser Char -> Parser Char -> Parser String
twoChars p1 p2 = Parser stringParser
where stringParser initialString = concatMap nextParser (runParser p1 initialString)
nextParser (firstChar, restString) = concatMap finalResult (runParser p2 restString)
where finalResult (secondChar, remaining) =
[([firstChar,secondChar], remaining)]
```

These parser definitions can become much simpler if we recognize that Parser is a monad.

(All monads are also “Applicative functors” and “functors”, so declare those too.)

```
instance Functor Parser where
-- fmap :: (a -> b) -> Parser a -> Parser b
fmap f p = Parser g
where g s = map (\(a,r) -> (f a, r)) (runParser p s)
```

```
instance Applicative Parser where
-- pure: a -> Parser a
pure a = Parser (\s -> [(a,s)])
-- <*> :: Parser (a -> b) -> Parser a -> Parser b
-- liftA2 :: (a -> b -> c) -> Parser a -> Parser b -> Parser c
liftA2 combine pa pb = Parser f
where f s1 = concatMap g (runParser pa s1)
g (a,s2) = concatMap h (runParser pb s2)
where h (b,s3) = [(combine a b, s3)]
```

```
instance Monad Parser where
-- (>>=) :: Parser a -> (a -> Parser b) -> Parser b
pa >>= fpb = Parser g
where g s1 = concatMap h (runParser pa s1)
where h (a,s2) = runParser (fpb a) s2
```

Alternative allows a choice – `empty`

is failure and `(<|>)`

is like “or else”.

```
instance Alternative Parser where
-- empty :: Parser a
empty = Parser (\_ -> [])
p1 <|> p2 = Parser g
where g s1 =
case runParser p1 s1 of
[] -> runParser p2 s1 -- p1 failed, so try p2
results -> results -- otherwise p1's results are fine
```

Building a parser out of these components.

This `parseInt`

works for sequences of digits, optionally followed by a negative sign:

```
ghci> runParser parseInt "13"
[("13","")]
ghci> runParser parseInt "5221x"
[("5221","x")]
ghci> runParser parseInt "-5221x"
[("-5221","x")]
ghci> runParser parseInt "- 5221x"
[]
ghci> runParser parseInt " -19"
[("-19","")]
```