Compelling example of a monad: Parsing
Parsing means taking a stream of characters or “tokens” and interpreting them as objects in some language.
"-3.115"
as a string can be interpreted to -3.115 as a Float.
"a+3*b"
as an arithmetic tree.
Parse English sentences (to some extent).
"Jack went to the store." -->
SENTENCE(SUBJECT "Jack", VERB "went",
PP (Prep "to" (NP (ARTICLE "the", OBJECT "store")))
Pluck bits of data out of some text (scraping).
Why this type? The first String is the input – the stream of characters we are to parse.
The function returns a LIST because there can be multiple valid ways to parse the same string. Example: "a+b+c"
could be grouped either to the left or the right: (ADD (ADD a b) c)
or (ADD a (ADD b c))
.
Real ambiguity from a real programming language: C/C++:
if() __ else __
if() { if() __ } else __ // else attaches to outer if
if() { if() __ else __ } // else attaches to inner if
But in C/C++ the curly braces are optional, so without them this is ambiguous.
The result is a PAIR of (a, String). The ‘a’ component represents the value that was parsed (of type a). The String represents any LEFTOVER characters that weren’t used yet.
Example: parsing an integer
" -410xyz" --> [(-410, "xyz")]
A COMPLETE parse is when there are no characters leftover. But incomplete parses are used while parsing is in progress.
"32+51" --> [(32, "+51")] is the first step
eventually it gets to
--> [(BINOP(+,32,51), "")] as the complete parse.
Match a particular character. So char ‘A’ it should match an ‘A’ on the input stream, but it doesn’t match any other character.
satisfies :: (Char -> Bool) -> Parser Char
satisfies ok = Parser f
where f [] = []
f (d:ds)
| ok d = [(d,ds)]
| otherwise = []
twoChars (satisfies isAlpha) (satisfies isDigit) "A3"
--> [('A', "3")] -- After running first parser, isAlpha
--> [("A3", "")] -- Apply second parser to leftover string(s)
twoChars :: Parser Char -> Parser Char -> Parser String
twoChars p1 p2 = Parser stringParser
where stringParser initialString = concatMap nextParser (runParser p1 initialString)
nextParser (firstChar, restString) = concatMap finalResult (runParser p2 restString)
where finalResult (secondChar, remaining) =
[([firstChar,secondChar], remaining)]
These parser definitions can become much simpler if we recognize that Parser is a monad.
(All monads are also “Applicative functors” and “functors”, so declare those too.)
instance Functor Parser where
-- fmap :: (a -> b) -> Parser a -> Parser b
fmap f p = Parser g
where g s = map (\(a,r) -> (f a, r)) (runParser p s)
instance Applicative Parser where
-- pure: a -> Parser a
pure a = Parser (\s -> [(a,s)])
-- <*> :: Parser (a -> b) -> Parser a -> Parser b
-- liftA2 :: (a -> b -> c) -> Parser a -> Parser b -> Parser c
liftA2 combine pa pb = Parser f
where f s1 = concatMap g (runParser pa s1)
g (a,s2) = concatMap h (runParser pb s2)
where h (b,s3) = [(combine a b, s3)]
instance Monad Parser where
-- (>>=) :: Parser a -> (a -> Parser b) -> Parser b
pa >>= fpb = Parser g
where g s1 = concatMap h (runParser pa s1)
where h (a,s2) = runParser (fpb a) s2
Alternative allows a choice – empty
is failure and (<|>)
is like “or else”.
instance Alternative Parser where
-- empty :: Parser a
empty = Parser (\_ -> [])
p1 <|> p2 = Parser g
where g s1 =
case runParser p1 s1 of
[] -> runParser p2 s1 -- p1 failed, so try p2
results -> results -- otherwise p1's results are fine
Building a parser out of these components.
This parseInt
works for sequences of digits, optionally followed by a negative sign:
ghci> runParser parseInt "13"
[("13","")]
ghci> runParser parseInt "5221x"
[("5221","x")]
ghci> runParser parseInt "-5221x"
[("-5221","x")]
ghci> runParser parseInt "- 5221x"
[]
ghci> runParser parseInt " -19"
[("-19","")]