Scala Parser Combinator, Ambiguous Grammar & Parse

2020-05-27 09:16发布

问题:

I am trying to get the parser to return all possible parse results (parse forest) from an ambiguous grammar and choose from the parse forest by evaluating them against user context / history and a knowledge base. For performance reason, this should probably be done with the packrat parser and a search limit / upper-bound parameter to limite the number of recursive calls in applying production rules to avoid infinite loops.

Being new to both Scala and its Parser Combinators, I can't figure out how to do this or whether it can be done at all. Could someone please help? Much appreciated.

Best regards, Thomas Juan

回答1:

You cannot do this with Scala's built-in parser combinators. Packrat combinators are restricted to only unambiguous grammars. If you try to deal with ambiguity, you lose the ability to memoize and the parse complexity becomes O(k^n) even for unambiguous trails. So, don't do that.

What you want is a parser combinator framework that correctly handles ambiguity. As far as I know, the only such framework for Scala is my GLL combinators. The syntax is almost identical to Scala's parser combinators (the main difference being that higher-arity functions work correctly), but the output is a Stream[A], where A is the result type. Thus, you get a parse forest. Note that the result is not actually a SPPF (shared packed parse forest), so if you produce an exponential number of results, the stream will have proportional size. This was done to maintain ultimate flexibility in the result type.

GLL combinators is O(n^3) in the worst case, even for pathologically ambiguous grammars. In the average case, where the parse trail is unambiguous, GLL combinators is O(n). The constant time overhead is currently a bit excessive, but optimization is an ongoing project. We use GLL combinators in production at Precog, so you can expect the library to be well maintained going forward.