Sun Dec 2
Note about submission: The format of this assignment on Mimir is different than previous ones. You may use the web IDE on Mimir, or an editor and GHC on your local system.
If using your own system, you can download the starter code as a Zip file from Mimir (but there is only one file in it, A10.hs
). You can then upload the modified file.
If using the web IDE on Mimir, use this command in the terminal to install GHC:
apt install ghc
You might need to repeat that whenever you reopen the IDE? The starter code can be found within the folders at the left, and/or get there this way in the terminal:
cd cs168_functional_programming/assignment_10
ghci
:load A10.hs
This assignment includes the beginnings of an expression parser, that can scan through a stream of characters and build a tree to match arithmetic expressions. We begin with all the type and instance definitions that we covered in class…below this code there will be a friendlier summary of the available operations.
instance Functor Parser where
-- fmap :: (a -> b) -> Parser a -> Parser b
fmap f p = Parser g
where g s = mapFst f (runParser p s)
instance Applicative Parser where
-- pure: a -> Parser a
pure a = Parser (\s -> [(a,s)])
-- <*> :: Parser (a -> b) -> Parser a -> Parser b
pf <*> pa = Parser g
where g s1 = concatMap h (runParser pf s1)
where h (f, s2) = mapFst f (runParser pa s2)
instance Monad Parser where
-- (>>=) :: Parser a -> (a -> Parser b) -> Parser b
pa >>= fpb = Parser g
where g s1 = concatMap h (runParser pa s1)
where h (a,s2) = runParser (fpb a) s2
instance Alternative Parser where
-- empty :: Parser a
empty = Parser (\_ -> [])
-- (<|>) :: Parser a -> Parser a -> Parser a
p1 <|> p2 = Parser g
where
g s1 =
case runParser p1 s1 of
[] -> runParser p2 s1 -- p1 failed, so try p2
results -> results -- otherwise p1's results are fine
These are a really useful set of primitive operations on parsers. Basically we want to be able to match characters meeting certain criteria, or exact strings.
satisfies :: (Char -> Bool) -> Parser Char
satisfies ok = Parser f
where f [] = [] -- input was empty
f (c:cs)
| ok c = [(c,cs)] -- first char satisfies function
| otherwise = [] -- first char doesn't satisfy function
string :: String -> Parser String
string s = loop s >> return s
where loop "" = return ""
loop (c:cs) = char c >> loop cs
Some examples using these:
ghci> runParser (satisfies isAlpha) "abc"
[('a',"bc")]
ghci> runParser (satisfies isAlpha) "2bc"
[] -- failed parse
ghci> runParser (char 'x') "xyz"
[('x',"yz")]
ghci> runParser (string "hello") "helloworld"
[("hello","world")]
ghci> runParser (string "hello") "helium"
[] -- failed parse
actually executes the parser on a given string. Remember that the list result represents multiple possible ways to parse values of type a
, and the paired String
is the remaining characters that haven’t been used yet. When the list is empty, that means the parse failed.
These return a value without consuming any input at all, for example:
ghci> runParser (pure 99) "hello"
[(99,"hello")]
(>>) :: Parser a -> Parser b -> Parser b -- SEQUENCING
(<|>) :: Parser a -> Parser a -> Parser a -- ALTERNATIVES/CHOICE
These two operators have very similar types, but they do different things. The first represents a sequence – p >> q
means parse using p
and (if that succeeds) continue parsing using q
on the remaining string. However, the result returned is just that of the second parser, q
. The result of p
is ignored:
ghci> runParser (string "hello" >> string "world") "helloworld"
[("world","")]
ghci> runParser (string "hello" <|> string "world") "helloworld"
[("hello","world")]
ghci> runParser (string "hello" <|> string "world") "worldwide"
[("world","wide")]
The alternative type class also gives us these handy operators:
many :: Parser a -> Parser [a] -- Zero or more, in sequence
some :: Parser a -> Parser [a] -- One or more, in sequence
optional :: Parser a -> Parser (Maybe a)
Examples:
ghci> runParser (many (satisfies isAlpha)) "abc123"
[("abc","123")]
ghci> runParser (many (satisfies isAlpha)) "123abc"
[("","123abc")] -- Not a failure, just matches zero characters
ghci> runParser (some (satisfies isAlpha)) "abc123"
[("abc","123")]
ghci> runParser (some (satisfies isAlpha)) "123abc"
[] -- Failure
ghci> runParser (optional (char 'x')) "x18"
[(Just 'x',"18")]
ghci> runParser (optional (char 'x')) "y18"
[(Nothing,"y18")]
Finally, there is the usual applicative pattern, shaped like this:
where f
would be a function expecting four arguments, and the blanks would be replaced with four parsers. The parsers are executed in sequence, and if they all succeed, the results are passed to the function to produce a value. Here are some examples of that pattern:
ghci> runParser ((++) <$> some (satisfies isDigit)
<*> some (satisfies isAlpha)) "12ab34"
[("12ab","34")]
So that one matches digits, followed by alphabetic characters, and appends them together with (++)
.
ghci> runParser ((\a b c -> [a,b,c]) <$> string "hi" <*> pure "lo" <*> string "sa") "hisamo"
[(["hi","lo","sa"],"mo")]
We’ll reintroduce those tree types for arithmetic expressions:
data Arith
= Num Double
| Var String
| UnaryOp UnaryOp Arith
| BinaryOp BinaryOp Arith Arith
deriving (Eq, Show)
data BinaryOp
= Add -- add, +
| Sub -- subtract, -
| Mul -- multiply, *
| Div -- divide, /
| Pow -- power (exponent), **
deriving (Eq, Show)
Let’s begin with parsers for the UnaryOp
:
parseUnaryOp :: Parser UnaryOp
parseUnaryOp =
(string "sqrt" >> return Sqrt) <|>
(string "-" >> return Neg) <|>
(string "abs" >> return Abs)
Then BinaryOp
:
parseBinaryOp :: Parser BinaryOp
parseBinaryOp =
(string "+" >> return Add) <|>
(string "-" >> return Sub) <|>
(string "**" >> return Pow) <|>
(string "*" >> return Mul) <|>
(string "/" >> return Div)
Here it’s important to ensure that the **
symbol appears before the single *
in the alternatives. Otherwise it would prefer to match just a single *
when faced with two. Compare these:
ghci> runParser (string "**" <|> string "*") "**9"
[("**","9")]
ghci> runParser (string "*" <|> string "**") "**9"
[("*","*9")]
Now for parsing variable names. To begin, we’ll only allow variable names that are fully alphabetic:
Here are examples:
ghci> runParser parseVar "myCounter++"
[(Var "myCounter","++")]
ghci> runParser parseVar "my2ndCounter==2"
[(Var "my","2ndCounter==2")]
You can see in the second example that my2ndCounter
is not permitted as a single variable name.
TODO: Revise the definition of parseVar
so that variable names must begin with an alphabetic character, but then can contain alpha-numeric characters after the first one (this is how identifiers are defined in most programming languages). Hint: there’s a function isAlphaNum
in Data.Char
.
Your revised version of parseVar
should support variable names like these:
ghci> runParser parseVar "my2ndCounter==2"
[(Var "my2ndCounter","==2")]
ghci> runParser parseVar "h238a19"
[(Var "h238a19","")]
ghci> runParser parseVar "x"
[(Var "x","")]
ghci> runParser parseVar "x3"
[(Var "x3","")]
ghci> runParser parseVar "3x"
[] -- failed parse; cannot start with digit
To begin, let’s ignore decimal points. We’ll just support integers with an optional negative sign.
parseNum :: Parser Arith
parseNum =
f <$> optional (char '-') <*> some (satisfies isDigit)
where
f Nothing digits = Num (read digits)
f (Just _) digits = Num (read ('-':digits))
We are slightly cheating here, because we’re using the Haskell function read
to do the actual conversion from String
to Double
. It is already a sort of parser for numbers and other types. There are ways to code that conversion ourselves, but it’s just not a technique I want to focus on for now.
So this works for integers, but stops at any decimal point:
ghci> runParser parseNum "138*2"
[(Num 138.0,"*2")]
ghci> runParser parseNum "138.2"
[(Num 138.0,".2")]
ghci> runParser parseNum "3.14159"
[(Num 3.0,".14159")]
ghci> runParser parseNum "-18"
[(Num (-18.0),"")]
ghci> runParser parseNum "x"
[] -- failure
TODO: Revise the definition of parseNum
so that numbers can (optionally) contain decimal parts. The read
function can still be used to do the conversion, but you need to feed it strings of the right form.
Once you have revised parseNum
appropriately, here’s how it should behave:
ghci> runParser parseNum "138*2"
[(Num 138.0,"*2")]
ghci> runParser parseNum "138.2"
[(Num 138.2,"")]
ghci> runParser parseNum "3.14159"
[(Num 3.14159,"")]
ghci> runParser parseNum "-18"
[(Num (-18.0),"")]
ghci> runParser parseNum "-90.05x"
[(Num (-90.05),"x")]