Help with left factoring a grammar to remove left

2019-02-17 03:14发布

问题:

I have a small custom scripting language, and I am trying to update it to allow boolean expressions such as a > 2 and a > 2 and (b < 3 or c > 5). It's the parenthetical expressions that I am having trouble with here.

Here is a (edited since the original post based on the answer from @Bart Kiers) full grammar that exhibits the problem. This is a pared-down version of my actual grammar, but the problem occurs here too.

grammar test;


options {
    language = 'JavaScript'; 
    output = AST;
} 


statement 
    :   value_assignment_statement  
        EOF
    ;


value_assignment_statement 
    :   IDENT
        '='
        expression                      
    ;

value_expression 
    :   value_list_expression           
    |   IDENT                           
    ;


value_list_expression 
    :   value_enumerated_list       
    ;


value_enumerated_list : '{' unary+ '}'
    ;



term 
    :   LPAREN expression RPAREN        
    |   INTEGER                         
    |   value_expression                
    ;

unary : ( '+' | '-' )* term
    ;

mult :  unary ( ('*' | '/') unary)*
    ;

expression : mult ( ('+' | '-') mult )*
    ;


boolean 
    :   boolean_expression
        EOF
    ;

boolean_expression
    :   boolean_or_expression
    ;

boolean_or_expression 
    :   boolean_and_expression (OR boolean_and_expression)*
    ;

boolean_and_expression 
    :   boolean_rel_expression (AND boolean_rel_expression)*
    ;

boolean_rel_expression
    :   boolean_neg_expression relational_operator boolean_neg_expression
    ;

boolean_neg_expression 
    :   (NOT)? atom
    ;

atom
    :   LPAREN boolean_expression RPAREN
    //| expression
    ;


relational_operator : '=' | '>' | '<';


LPAREN      :   '(';
RPAREN      :   ')';
AND         :   'and';
OR          :   'or';
NOT         :   'not';
IDENT       :   LETTER LETTER+;
INTEGER     :   DIGIT+;
WS          :   (' ' | '\n' | '\r' | '\t')+     { $channel = HIDDEN; };

fragment DIGIT      : '0'..'9';
fragment LETTER     : ('a'..'z' | 'A'..'Z');

My attempt to accommodate parenthetical boolean expressions such as a > 2 or (b < 3) is in the commented-out line in the atom rule. When I uncomment this line and include it in the grammar, ANTLR gives me this error:

[fatal] rule atom has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.

I would like to address this by removing the recursion, but I can't seem to make the transition from the Wikipedia description on how to remove left recursion to my own stuff.

In using this grammar, I want sometimes to use statement as a root with input such as abc = 2 + 3, which assigns a value to a variable named abc. Other times I want to use the grammar to evaluate an expression with boolean as the root with input such as abc > 3 and (xyz < 5 or xyz > 10). When I tried to use @Bart's answer as a model, it worked fine until I tried to merge the parts of the grammar used by statement with the parts used by boolean. They should both be able to use an expression, but that's where I'm stuck with this left recursion error.

So, how can I both handle parentheses and avoid the left recursion problem?

回答1:

Boolean expressions are just the same as the additive- and multiplicative expression, and should therefore not be separated from them. Here's how to account for all types of expressions:

grammar test;

parse
  :  expression EOF
  ;

expression 
  :  or
  ;

or
  :  and (OR and)*
  ;

and
  :  rel (AND rel)*
  ;

rel
  :  add (('=' | '>' | '<') add)*
  ;

add
  :  mult (('+' | '-') mult)*
  ;

mult
  :  unary (('*' | '/') unary)*
  ;

unary 
  :  '-' term
  |  '+' term
  |  NOT term
  |  term
  ;

term 
  :  INTEGER  
  |  IDENT       
  |  list
  |  '(' expression ')'
  ;

list 
  :  '{' (expression (',' expression)*)? '}'
  ;

AND     :  'and';
OR      :  'or';
NOT     :  'not';
IDENT   :  LETTER LETTER*;
INTEGER :  DIGIT+;
WS      :  (' ' | '\n' | '\r' | '\t')+  { $channel = HIDDEN; };

fragment DIGIT   : '0'..'9';
fragment LETTER  : ('a'..'z' | 'A'..'Z');

which will parse the example input:

abc > 3 and (xyz < 5 or xyz > {1, 2, 3})

into the following parse tree: