Assignment 10 solutions

Sun Dec 2

Note about submission: The format of this assignment on Mimir is different than previous ones. You may use the web IDE on Mimir, or an editor and GHC on your local system.

  • If using your own system, you can download the starter code as a Zip file from Mimir (but there is only one file in it, A10.hs). You can then upload the modified file.

  • If using the web IDE on Mimir, use this command in the terminal to install GHC:

    apt install ghc

    You might need to repeat that whenever you reopen the IDE? The starter code can be found within the folders at the left, and/or get there this way in the terminal:

    cd cs168_functional_programming/assignment_10
    ghci
    :load A10.hs

Type and instance declarations

This assignment includes the beginnings of an expression parser, that can scan through a stream of characters and build a tree to match arithmetic expressions. We begin with all the type and instance definitions that we covered in class…below this code there will be a friendlier summary of the available operations.

Helpful primitives

These are a really useful set of primitive operations on parsers. Basically we want to be able to match characters meeting certain criteria, or exact strings.

Some examples using these:

ghci> runParser (satisfies isAlpha) "abc"
[('a',"bc")]
ghci> runParser (satisfies isAlpha) "2bc"
[]    -- failed parse
ghci> runParser (char 'x') "xyz"
[('x',"yz")]
ghci> runParser (string "hello") "helloworld"
[("hello","world")]
ghci> runParser (string "hello") "helium"
[]    -- failed parse

Summary of parsing operators

actually executes the parser on a given string. Remember that the list result represents multiple possible ways to parse values of type a, and the paired String is the remaining characters that haven’t been used yet. When the list is empty, that means the parse failed.

These return a value without consuming any input at all, for example:

ghci> runParser (pure 99) "hello"
[(99,"hello")]

These two operators have very similar types, but they do different things. The first represents a sequencep >> q means parse using p and (if that succeeds) continue parsing using q on the remaining string. However, the result returned is just that of the second parser, q. The result of p is ignored:

ghci> runParser (string "hello" >> string "world") "helloworld"
[("world","")]
ghci> runParser (string "hello" <|> string "world") "helloworld"
[("hello","world")]
ghci> runParser (string "hello" <|> string "world") "worldwide"
[("world","wide")]

The alternative type class also gives us these handy operators:

Examples:

ghci> runParser (many (satisfies isAlpha)) "abc123"
[("abc","123")]
ghci> runParser (many (satisfies isAlpha)) "123abc"
[("","123abc")]    -- Not a failure, just matches zero characters
ghci> runParser (some (satisfies isAlpha)) "abc123"
[("abc","123")]
ghci> runParser (some (satisfies isAlpha)) "123abc"
[]    -- Failure
ghci> runParser (optional (char 'x')) "x18"
[(Just 'x',"18")]
ghci> runParser (optional (char 'x')) "y18"
[(Nothing,"y18")]

Finally, there is the usual applicative pattern, shaped like this:

where f would be a function expecting four arguments, and the blanks would be replaced with four parsers. The parsers are executed in sequence, and if they all succeed, the results are passed to the function to produce a value. Here are some examples of that pattern:

ghci> runParser ((++) <$> some (satisfies isDigit)
                   <*> some (satisfies isAlpha)) "12ab34"
[("12ab","34")]

So that one matches digits, followed by alphabetic characters, and appends them together with (++).

ghci> runParser ((\a b c -> [a,b,c]) <$> string "hi" <*> pure "lo" <*> string "sa") "hisamo"
[(["hi","lo","sa"],"mo")]

Arithmetic tree types

We’ll reintroduce those tree types for arithmetic expressions:

A start on expression parsers

Let’s begin with parsers for the UnaryOp:

Then BinaryOp:

Here it’s important to ensure that the ** symbol appears before the single * in the alternatives. Otherwise it would prefer to match just a single * when faced with two. Compare these:

ghci> runParser (string "**" <|> string "*") "**9"
[("**","9")]
ghci> runParser (string "*" <|> string "**") "**9"
[("*","*9")]

Variable names

Now for parsing variable names. To begin, we’ll only allow variable names that are fully alphabetic:

Here are examples:

ghci> runParser parseVar "myCounter++"
[(Var "myCounter","++")]
ghci> runParser parseVar "my2ndCounter==2"
[(Var "my","2ndCounter==2")]

You can see in the second example that my2ndCounter is not permitted as a single variable name.

TODO: Revise the definition of parseVar so that variable names must begin with an alphabetic character, but then can contain alpha-numeric characters after the first one (this is how identifiers are defined in most programming languages). Hint: there’s a function isAlphaNum in Data.Char.

Solution:

Your revised version of parseVar should support variable names like these:

ghci> runParser parseVar "my2ndCounter==2"
[(Var "my2ndCounter","==2")]
ghci> runParser parseVar "h238a19"
[(Var "h238a19","")]
ghci> runParser parseVar "x"
[(Var "x","")]
ghci> runParser parseVar "x3"
[(Var "x3","")]
ghci> runParser parseVar "3x"
[]   -- failed parse; cannot start with digit

Floating-point numbers

To begin, let’s ignore decimal points. We’ll just support integers with an optional negative sign.

We are slightly cheating here, because we’re using the Haskell function read to do the actual conversion from String to Double. It is already a sort of parser for numbers and other types. There are ways to code that conversion ourselves, but it’s just not a technique I want to focus on for now.

So this works for integers, but stops at any decimal point:

ghci> runParser parseNum "138*2"
[(Num 138.0,"*2")]
ghci> runParser parseNum "138.2"
[(Num 138.0,".2")]
ghci> runParser parseNum "3.14159"
[(Num 3.0,".14159")]
ghci> runParser parseNum "-18"
[(Num (-18.0),"")]
ghci> runParser parseNum "x"
[]    -- failure

TODO: Revise the definition of parseNum so that numbers can (optionally) contain decimal parts. The read function can still be used to do the conversion, but you need to feed it strings of the right form.

Solution:

Once you have revised parseNum appropriately, here’s how it should behave:

ghci> runParser parseNum "138*2"
[(Num 138.0,"*2")]
ghci> runParser parseNum "138.2"
[(Num 138.2,"")]
ghci> runParser parseNum "3.14159"
[(Num 3.14159,"")]
ghci> runParser parseNum "-18"
[(Num (-18.0),"")]
ghci> runParser parseNum "-90.05x"
[(Num (-90.05),"x")]