I've been writing a little monadic parser-combinator library in F# (somewhat similar to FParsec) and now tried to implement a small parser for a programming language.
I first implemented the code in Haskell (with Parsec) which ran perfectly well. The parsers for infix expressions are designed mutually recursive.
parseInfixOp :: Parser String -> Parser Expression -> Parser Expression
parseInfixOp operatorParser subParser = ignoreSpaces $ do
x <- ignoreSpaces $ subParser
do
op <- ignoreSpaces $ operatorParser
y <- parseInfixOp operatorParser subParser
return $ BinaryOp op x y
<|> return x
parseInfix :: [String] -> Parser Expression -> Parser Expression
parseInfix list = parseInfixOp (foldl1 (<|>) $ map string list)
parseExpr :: Parser Expression
parseExpr = parseInfix0
parseInfix0 = parseInfix ["==", "<>", "And", "Or", "Xor", "<", ">", "<=", ">="] parseInfix1
parseInfix1 = parseInfix ["+", "-", "Mod"] parseInfix2
parseInfix2 = parseInfix ["*", "/", "\\"] parseInfix3
parseInfix3 = parseInfix ["^"] parseInfix4
parseInfix4 = parseFactor
parseFactor :: Parser Expression
parseFactor = parseFactor' <|> (betweenChars '(' ')' parseExpr)
parseFactor' :: Parser Expression
parseFactor' = parseString
<|> parseBool
<|> parseNumber
<|> parseVariable
<|> (try parseFunCall) <|> parseIdentifier
Since the order of functions doesn't matter and Haskell is evaluating in a non-strict way, this is OK, but F# is evaluating strictly.
let rec parseExpr = parseInfix0
and parseFactor = (parseFactor') <|> (betweenChars '(' ')' parseExpr)
and parseInfix2 = parseInfix ["^"] parseFactor BinaryOp
and parseInfix1 = parseInfix ["*"; "/"] parseInfix2 BinaryOp
and parseInfix0 = parseInfix ["+"; "-"] parseInfix1 BinaryOp
and parseFunCall = parser {
let! first = letter
let! rest = many (letter <|> digit)
let funcName = toStr $ first::rest
do! ignoreSpace
let! args = betweenChars '(' ')' $ sepBy (parseExpr) ","
return FunCall(funcName, args)
}
and parseFactor' =
parseNumber
<|> parseString
<|> parseBool
<|> parseFunCall
<|> parseIdentifier
F# now either complains about recursive objects and just throws a StackOverflowException
at runtime due to an infinite loop or it even doesn't compile the source because "a value would be part of its own definion".
What's the best way to prevent this errors. The debugger advices me to make use functions or lazy
s instead but what should I make lazy here?
η-conversion is not necessarily a great solution - if you do this, you'll have to prove that the delayed function is run at most once, or pay a lot of overhead for calling it during parsing.
You want something like this:
A better way, if you have a workflow builder, is to define the Delay member to do this for you:
Neither the eta rewrite nor the lazy delaying is a sure thing. The F# compiler seems to have a hard time dealing with deep recursions. What worked for me was to collapse the recursion into a single top level function (by passing the function to be called recursively as an argument). This top level is written eta style.
Top level, I have:
Note wikipedia says: "In a strict language like OCaml, we can avoid the infinite recursion problem by forcing the use of a closure. ". That is what worked for me.
What is the warning about recursive objects? Show the text; there's one such warning that is ignorable (indeed, in a sense desirable) for this case.
If it doesn't compile because of recursive values, you simply need to turn the 'syntactic values' into 'syntactic functions'. That is, rather than
use
even though the type of 'parseInfix2' is the same function type either way... F# (unlike Haskell) will sometimes require you to be explicit (do eta-conversion as above).
I'd ignore suggestions about inserting 'lazy', parsers are indeed functions, not values, so the eta-conversion will cover the same issue (none of this will be evaluated eagerly at all, it all needs to 'wait' until you pass the string to be parsed before anything starts to 'run').
Regarding StackOverflowExceptions, if you post a looping snippet of the stack it may help, but you maybe can see for yourself e.g. if you have a left-recursive portion of the grammar that doesn't consume any input and gets caught in a loop. I think that's an easy pitfall for most parsing technologies in most languages.