As a purely academic exercise, I'm writing a recursive descent parser from scratch -- without using ANTLR or lex/yacc.
I'm writing a simple function which converts math expressions into their equivalent AST. I have the following:
// grammar
type expr =
| Lit of float
| Add of expr * expr
| Mul of expr * expr
| Div of expr * expr
| Sub of expr * expr
// tokens
type tokens =
| Num of float
| LParen | RParen
| XPlus | XStar | XMinus | XSlash
let tokenize (input : string) =
Regex.Matches(input.Replace(" ", ""), "\d+|[+/*\-()]")
|> Seq.cast<Match>
|> Seq.map (fun x -> x.Value)
|> Seq.map (function
| "+" -> XPlus
| "-" -> XMinus
| "/" -> XSlash
| "*" -> XStar
| "(" -> LParen
| ")" -> RParen
| num -> Num(float num))
|> Seq.to_list
So, tokenize "10 * (4 + 5) - 1"
returns the following token stream:
[Num 10.0; XStar; LParen; Num 4.0; XPlus; Num 5.0; RParen; XMinus; Num 1.0]
At this point, I'd like to map the token stream to its AST with respect to operator precedence:
Sub(
Mul(
Lit 10.0
,Add(Lit 4.0, Lit 5.0)
)
,Lit 1.0
)
However, I'm drawing a blank. I've never written a parser from scratch, and I don't know even in principle how to begin.
How do I convert a token stream its representative AST?
Do you know about language grammars?
Assuming yes, you have a grammar with rules along the lines
which ends up turning into code like (writing code in browser, never compiled)
Hopefully you see the basic idea. This assumes a global mutable token stream that allows both 'peek at next token' and 'consume next token'.
If I remember from college classes the idea was to build expression trees like:
then once you have construction your tree completely so you get something like:
then you run your completed tree through another program that recursively descents into the tree calculating expressions until you have an answer. If your parser doesn't understand the tree, you have a syntax error. Hope that helps.