CS168 - Notes week of 19 Nov

import GHC.Base (liftA2)
import Control.Applicative
import Data.Char

Compelling example of a monad: Parsing

Parsing means taking a stream of characters or “tokens” and interpreting them as objects in some language.

"-3.115" as a string can be interpreted to -3.115 as a Float.

"a+3*b" as an arithmetic tree.

Parse English sentences (to some extent).

"Jack went to the store." -->
  SENTENCE(SUBJECT "Jack", VERB "went",
     PP (Prep "to" (NP (ARTICLE "the", OBJECT "store")))

Pluck bits of data out of some text (scraping).

data Parser a =
  Parser { runParser :: String -> [(a, String)] }

Why this type? The first String is the input – the stream of characters we are to parse.

The function returns a LIST because there can be multiple valid ways to parse the same string. Example: "a+b+c" could be grouped either to the left or the right: (ADD (ADD a b) c) or (ADD a (ADD b c)).

Real ambiguity from a real programming language: C/C++:

   if() __ else __

   if() { if() __ } else __     // else attaches to outer if
   if() { if() __ else __ }     // else attaches to inner if

But in C/C++ the curly braces are optional, so without them this is ambiguous.

The result is a PAIR of (a, String). The ‘a’ component represents the value that was parsed (of type a). The String represents any LEFTOVER characters that weren’t used yet.

Example: parsing an integer

    " -410xyz" --> [(-410, "xyz")]

A COMPLETE parse is when there are no characters leftover. But incomplete parses are used while parsing is in progress.

   "32+51" --> [(32, "+51")]   is the first step
eventually it gets to
    --> [(BINOP(+,32,51), "")] as the complete parse.

Match a particular character. So char ‘A’ it should match an ‘A’ on the input stream, but it doesn’t match any other character.

char :: Char -> Parser Char
char c = satisfies (== c)

satisfies :: (Char -> Bool) -> Parser Char
satisfies ok = Parser f
  where f [] = []
        f (d:ds)
          | ok d = [(d,ds)]
          | otherwise = []

twoChars (satisfies isAlpha) (satisfies isDigit) "A3"
  --> [('A', "3")]      -- After running first parser, isAlpha
  --> [("A3", "")]      -- Apply second parser to leftover string(s)

twoChars :: Parser Char -> Parser Char -> Parser String
twoChars p1 p2 = Parser stringParser
  where stringParser initialString = concatMap nextParser (runParser p1 initialString)
        nextParser (firstChar, restString) = concatMap finalResult (runParser p2 restString)
            where finalResult (secondChar, remaining) =
                     [([firstChar,secondChar], remaining)]

These parser definitions can become much simpler if we recognize that Parser is a monad.

(All monads are also “Applicative functors” and “functors”, so declare those too.)

instance Functor Parser where
  -- fmap :: (a -> b) -> Parser a -> Parser b
  fmap f p = Parser g
    where g s = map (\(a,r) -> (f a, r)) (runParser p s)

instance Applicative Parser where
  -- pure: a -> Parser a
  pure a = Parser (\s -> [(a,s)])
  -- <*> :: Parser (a -> b) -> Parser a -> Parser b
  -- liftA2 :: (a -> b -> c) -> Parser a -> Parser b -> Parser c
  liftA2 combine pa pb = Parser f
    where f s1 = concatMap g (runParser pa s1)
          g (a,s2) = concatMap h (runParser pb s2)
             where h (b,s3) = [(combine a b, s3)]

instance Monad Parser where
  -- (>>=) :: Parser a -> (a -> Parser b) -> Parser b
  pa >>= fpb = Parser g
    where g s1 = concatMap h (runParser pa s1)
            where h (a,s2) = runParser (fpb a) s2

Alternative allows a choice – empty is failure and (<|>) is like “or else”.

instance Alternative Parser where
  -- empty :: Parser a
  empty = Parser (\_ -> [])
  p1 <|> p2 = Parser g
    where g s1 =
            case runParser p1 s1 of
               [] -> runParser p2 s1   -- p1 failed, so try p2
               results -> results      -- otherwise p1's results are fine

Building a parser out of these components.

optionalNeg :: Parser String
optionalNeg =
   ignoreSpaces >> ((pure <$> char '-') <|> pure "")

ignoreSpaces :: Parser String
ignoreSpaces =
   many (satisfies isSpace) >> pure ""

parseInt :: Parser String
parseInt =
  (++) <$> optionalNeg <*> some (satisfies isDigit)

This parseInt works for sequences of digits, optionally followed by a negative sign:

ghci> runParser parseInt "13"
[("13","")]
ghci> runParser parseInt "5221x"
[("5221","x")]
ghci> runParser parseInt "-5221x"
[("-5221","x")]
ghci> runParser parseInt "- 5221x"
[]
ghci> runParser parseInt "   -19"
[("-19","")]

main :: IO ()
main = return ()

Source