I have been searching around the interwebs for a couple of days, trying to get an answer to my questions and i'm finally admitting defeat.
I have been given a grammar:
Dig ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Int ::= Dig | Dig Int
Var ::= a | b | ... z | A | B | C | ... | Z
Expr ::= Int | - Expr | + Expr Expr | * Expr Expr | Var | let Var = Expr in Expr
And i have been told to parse, evaluate and print expressions using this grammar
where the operators * + -
has their normal meaning
The specific task is to write a function parse :: String -> AST
that takes a string as input and returns an abstract syntax tree when the input is in the correct format (which i can asume it is).
I am told that i might need a suitable data type and that data type might need to derive from some other classes.
Following an example output
data AST = Leaf Int | Sum AST AST | Min AST | ...
Further more, i should consider writing a function
tokens::String -> [String]
to split the input string into a list of tokens
Parsing should be accomplished with
ast::[String] -> (AST,[String])
where the input is a list of tokens and it outputs an AST, and to parse sub-expressions i should simply use the ast function recursively.
I should also make a printExpr method to print the result so that
printE: AST -> String
printE(parse "* 5 5")
yields either "5*5"
or "(5*5)"
and also a function to evaluate the expression
evali :: AST -> Int
I would just like to be pointed in the right direction of where i might start. I have little knowledge of Haskell and FP in general and trying to solve this task i made some string handling function out of Java which made me realize that i'm way off track.
So a little pointer in the right direction, and maybe an explantion to 'how' the AST should look like
Third day in a row and still no running code, i really appreciate any attempt to help me find a solution!
Thanks in advance!
Edit
I might have been unclear: I'm wondering how i should go about from having read and tokenized an input string to making an AST.
To make a start with Haskell itself, I'd recommend Learn You a Haskell for Great Good!, a very well-written and entertaining guide. Real World Haskell is another oft-recommended starting point.
Edit: A more fundamental introduction to parsing is Functional Parsers. I wanted How to replace a failure by a list of successes by PHilip Wadler. Sadly, it doesn't seem to be available online.
To make a start with parsing in Haskell, I think you should first read monadic parsing in Haskell, then maybe this wikibook example, but also then the parsec guide.
Your grammar
suggests a few abstract data types:
This is inherently a recursively defined syntax tree, no need to make another one. Sorry about the clunky
Dig_
andInteg_
stuff - they have to start with an uppercase.(Personally I'd want to be converting the
Integ
s toInt
s straight away, so would have donenewtype Integ = Integ Int
, and would probably have donenewtype Var = Var Char
but that might not suit you.)Once you've done with the basic ones -
dig
andvar
, andneg_
,plus_
,in_
etcI'd go with the Applicative interface to build them up, so for example your parserexpr
forExpr
would be something likeSo nearly all the time, your Haskell code is clean and closely resembles the grammar you were given.
Parsing tokens into an Abstract Syntax Tree
OK, let's take your grammar
This is a nice easy grammar, because you can tell from the first token what sort of epression it will be. (If there was something more complicated, like
+
coming between numbers, or-
being used for subtraction as well as negation, you'd need the list-of-successes trick, explained in Functional Parsers.)Let's have some sample raw input:
Which I understand from the grammar represents
"(- 6 (+ 45 (let x=-5 in (* x x))))"
, and I'll assume you tokenised it aswhich fits the grammar, but you might well have got
which fits your sample
AST
better. I think it's good practice to name your AST after bits of your grammar, so I'm going to go ahead and replacewith
I've named the bits of an
E_Let
to make it clearer what they represent.Writing a parsing function
You could use
isDigit
by addingimport Data.Char (isDigit)
to help out:Yikes! Too many let clauses obscuring the meaning, and we'll be writing the same code for
E_Prod
and very much worse forE_Let
. Let's get this sorted out!The standard way of dealing with this is to write some combinators; instead of tiresomely threading the input
[String]
s through our definition, define ways to mess with the output of parsers (map) and combine multiple parsers into one (lift).Clean up the code 1: map
First we should define
pmap
, our own equivalent of themap
function so we can dopmap E_Neg (expr1 ss)
instead oflet (e,ss') = expr1 ss in (E_Neg e,ss')
nonono, I can't even read that! We need a type synonym:
But really this would be better if I did
so I could do
I'll leave that for you to figure out if you fancy.
Clean up the code 2: combining two parsers
We also need to deal with the situation when we have two parsers to run, and we want to combine their results using a function. This is called lifting the function to parsers.
or maybe even three parsers:
I'll let you think how to do that. In the let statement you'll need
liftP5
to parse the sections of a let statement, lifting a function that ignores the"="
and"in"
. You could makeand a couple more to help out with this.
Actually,
pmap
could also be calledliftP1
, but map is the traditional name for that sort of thing.Rewritten with the nice combinators
Now we're ready to clean up
expr
:That'd all work fine. Really, it's OK. But
liftP5
is going to be a bit long, and feels messy.Taking the cleanup further - the ultra-nice Applicative way
Applicative Functors is the way to go. Remember I suggested refactoring as
so you could make it an instance of
Functor
? Perhaps you don't want to, because all you've gained is a new namefmap
for the perfectly workingpmap
and you have to deal with all thosePar
constructors cluttering up your code. Perhaps this will make you reconsider, though; we canimport Control.Applicative
, then using thedata
declaration, we can define<*>
, which sort-of meansthen
and use<$>
instead ofpmap
, with*>
meaning<*>-but-forget-the-result-of-the-left-hand-side
so you would writeWhich looks a lot like your grammar definition, so it's easy to write code that works first time. This is how I like to write Parsers. In fact, it's how I like to write an awful lot of things. You'd only have to define
fmap
,<*>
andpure
, all simple ones, and no long repetativeliftP3
,liftP4
etc.Read up about Applicative Functors. They're great.
Hints for making Parser applicative:
pure
doesn't change the list.<*>
is likeliftP2
, but the function doesn't come from outside, it comes as the output fromp1
.OK, so it seems you're trying to build lots and lots of stuff, and you're not really sure exactly where it's all going. I would suggest that getting the definition for
AST
right, and then trying to implementevali
would be a good start.The grammer you've listed is interesting... You seem to want to input
* 5 5
, but output5*5
, whic is an odd choice. Is that really supposed to be a unary minus, not binary? Simlarly,* Expr Expr Var
looks like perhaps you might have meant to type* Expr Expr | Var
...Anyway, making some assumptions about what you meant to say, your AST is going to look something like this:
Now, let us try to do
printE
. It takes an AST and gives us a string. By the definition above, the AST has to be one of five possible things. You just need to figure out what to print for each one!show
turns anInt
into aString
.++
joins two strings together. I'll let you work out the rest of the function. (The tricky thing is if you want it to print brackets to correctly show the order of subexpressions... Since your grammer doesn't mention brackets, I guess no.)Now, how about
evali
? Well, this is going to be a similar deal. If the AST is aLeaf x
, thenx
is anInt
, and you just return that. If you have, say,Minus x
, thenx
isn't an integer, it's an AST, so you need to turn it into an integer withevali
. The function looks something likeThat works great so far. But wait! It looks like you're supposed to be able to use
Let
to define new variables, and refer to them later withVar
. Well, in that case, you need to store those variables somewhere. And that's going to make the function more complicated.My recommendation would be to use
Data.Map
to store a list of variable names and their corresponding values. You'll need to add the variable map to a type signature. You can do it this way:So
evali
now just callsevaluate
with an empty variable map. Whenevaluate
seesLet
, it adds the variable to the map. And when it seesVar
, it looks up the name in the map.As for parsing a string into an AST in the first place, that is an entire other answer again...