Left Factoring & Removing Left Recursion JavaCC

2019-05-28 01:42发布

问题:

I have a grammar which I have to use JJTree and JavaCC to create a symbol table and an AST. While I fully understand the sections of my assignment to create the table and tree, the grammar I was given is ambiguous, contains left recursion and indirect left recusion. It also needs to be left factored. I have trawled all over the internet to try find methods that would work for me.

For example:

A ::= Aα | β

can be changed to:

A ::= βA'
A' ::= αA' | ε

But I don't know how to apply this to my grammar.
Here is a section of the production rules I wrote from the grammar that contains the problems aforementioned.

void statement() #STM : {}
{
   identifier() <ASSIGNMENT> expression()
   | identifier() <ASSIGNMENT> <STRING>
   | <EXCLAMATION> expression()
   | <QUESTION> identifier()
   | identifier() <LBR> arg_list() <RBR>
   | <BEGIN> (statement() <SEMIC>)+ <END>
   | matched()
   | unmatched()
   | <WHILE> <LBR> condition() <RBR> <DO> statement()
   | {}
}

void matched() #void : {}
{
    <IF> condition() <THEN> matched() <ELSE> matched()
}

void unmatched() #void : {}
{
    <IF> condition() <THEN> statement()
    |  <IF> condition() <THEN> matched() <ELSE> unmatched()
}

void expression() #EXPR : {}
{
    fragment() ((<PLUS>|<MINUS>|<MULT>|<DIV>) fragment())*
}

void fragment() #FRAG : {}
{
    (identifier() | <NUM> | (<PLUS>|<MINUS>) fragment() | expression())
}

回答1:

You have a number of problems here. Most are dealt with in question 4.6 of the JavaCC FAQ. http://www.engr.mun.ca/~theo/JavaCC-FAQ/

First, there is a lot of left-factoring to do. Left factoring tries to move choices to later in the parse. E.g. if you have

void statement() #STM : {}
{
   identifier() <ASSIGNMENT> expression()
 | identifier() <ASSIGNMENT> <STRING>
 | identifier() <LBR> arg_list() <RBR>
}

and the parser is expecting a statement and the next item of input is an identifier, then the parser can't make the choice. Factor out the common parts on the left to get

void statement() #STM : {}
{
   identifier()
      (   <ASSIGNMENT> expression()
      |   <ASSIGNMENT> <STRING>
      |   <LBR> arg_list() <RBR>
      )
}

and then

void statement() #STM : {}
{
   identifier()
      (   <ASSIGNMENT> ( expression() | <STRING> )
      |   <LBR> arg_list() <RBR>
      )
}

Second, the nonterminal "matched" is useless, as there is no nonrecursive case. I suspect that you are trying to deal with the dangling else problem. This is not a good way to deal with the dangling else problem. See the JavaCC FAQ for a sensible way to deal with it.

Third, there is mutual left recursion between nonterminals "fragment" and "expression". I'm not sure what you are trying to accomplish here. There are several ways to deal with parsing expressions that don't use left recursion. See http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm for more information. My tutorial introduction to JavaCC might also help. http://www.engr.mun.ca/~theo/JavaCC-Tutorial/

Finally a word of advice. Start with a grammar for a small subset of your language and then add constructs one or two at a time. That way you won't have to deal with a lot of problems at once.