Tail-recursion in FParsec

2019-05-17 21:27发布

I have encounter a problem with parsers having two branches of recursion. To demonstrate the problem easier, I use a simple grammar of a lambda calculus from the article written by Luca Bolognese as the example:

<expression> ::= <name> | <function> | <application>  
<name> ::= non­blank character sequence  
<function> ::= \ <name> . <body>  
<body> ::= <expression>  
<application> ::= ( <function expression> <argument expression> )  
<function expression> ::= <expression>  
<argument expression> ::= <expression>

The parser in the article is quite concise:

let ws = " \t\n" 
let specialChars = ".)(\\\n" 

let pWs = spaces 
let pName = manyChars (noneOf (ws + specialChars)) |>> EName 

let pExpr, pExprRef = createParserForwardedToRef<Expression, Unit>() 

let curry2 f a b = f(a,b) 
let pFunction = pchar '\\' >>. pipe2 pName (pchar '.' >>. pExpr) (curry2 Function) 

let pApplication = pchar '(' >>. pipe2 pExpr (pWs >>. pExpr) (curry2 Application)
                            .>> pWs .>> pchar ')'

do pExprRef := pFunction <|> pApplication <|> pName 

let pExpressions = sepBy pExpr spaces1 

let fparseString text = 
    match run pExpressions text with 
    | Success(result, _, _)   -> result 
    | Failure(errorMsg, _, _) -> failwith (sprintf "Failure: %s" errorMsg) 

I'm interested in pApplication since they consist of two pExprs which in turn could be pApplications too. The parser runs out of stack space in the following benchmark:

let generateString level =
    let rec loop i =
        seq {
                if i < level then
                    yield "("
                    yield! loop level
                    yield " "
                    yield! loop (i+1)
                    yield ")"
                else 
                    yield "(x x)"
        }
    loop 0 |> String.concat ""

let N = 5000
let s = generateString N;; 
let _ = fparseString s;;

How can I rewrite the parser to be tail-recursive?

I recognized the problem when trying to write a parser for a Lisp-like language and test it with real benchmarks. I have Term and VarBinding which are mutually recursive types and a let parser which exhibits the same issue as pApplication above. My parser is on github in case the analysis is wrong regarding the not tail-recursive problem.

1条回答
Bombasti
2楼-- · 2019-05-17 22:11

The built-in parser combinators of FParsec generally don't allow for a tail-recursive parser implementation, mainly because of the way the error handling is implemented.

This means that the recursion depth of an FParsec parser is limited by the stack size -- as is the case for most recursive descent parsers. Of course, this doesn't affect the parsing of sequences with many, sepBy, chainl etc., because these FParsec combinators all have implementations with constant stack space usage.

The default stack size in .NET is usually more than enough for parsing human-generated input in well-specified formats with FParsec (assuming you parse sequences with the built-in combinators). However, machine-generated input with a deeply nested structure (like your 5000 level deep s-expressions) can easily lead to a stack overflow.

If you know in advance that the nesting depth in your input is limited to a reasonable value, you could simply increase the stack size to an appropriate value. Hopefully this is true for your benchmark data.

Otherwise you may have to implement a special purpose parser function for the recursive element(s) of your grammar. You would need to implement this function using the low-level API of FParsec and you would obviously have to implement this function such that it uses the heap instead of the stack for keeping track of the nesting context and intermediate parsing results.

查看更多
登录 后发表回答