antlr3 remove treenode with subtree

2019-07-22 12:21发布

问题:

i try to do some tree to tree transform with antlr3.4

It's (for this question) about boolean expressions were "AND" and "OR" are allowed to bind to n expressions. The parser stage creates something like this

 (OR 
   (AND (expr1) (expr2) (expr3) 
     (OR (AND (expr4))
         (AND (expr5))
         (AND (expr6))
     )
   )
 )

Unfortunately there are AST nodes for "AND" and "OR" that bind just to one expression. (Which is useless, but hey - rules andExpr and orExpr are invoked)

I tried to kick them out (mean, replace them by their subnodes) but fail to do so in a tree grammar. (BTW: Using a depth first tree traversal/modification in pure java works, but that's not my intention)

I tried to use predicates but i can't seem to get it right.

This is the grammar to parse the stream unmodified

 start  :  
   orExpr^   EOF!  
   ;

 orExpr       :  
   ^(OR  r+=andExpr+ )   -> ^(OR $r)
   ;

 andExpr  : 
   ^(AND unaryExpr+ )
   ; 

 notExpr:
   ^( NOT unaryExpr)    
   ;

 unaryExpr : 
   .+  // it gets more complicated below this
   ;

I tried a predicate to catch the one-subnode-case but fail to pass the n>1 case unmodified

 orExpr @init { int N = 0; }
   :  
   ( ^(OR  (r+=andExpr {N++;})+ )  {N==1}? -> $r) 
   ;

Any Ideas how to do it right?

edit: Attached is the parser grammar which is pretty much the same...

 start 
   :  '('! orExpr^  ')'! EOF!        ;
 orExpr
   : a+=andExpr (  OR_T a+=andExpr )*  -> ^(OR  $a+ )  // 'AND' and 'OR' are multivalent
   ;

 andExpr
   : u+=unaryExpr ( AND_T u+=unaryExpr )* -> ^(AND $u+ )
   ; 

 notExpr
   : NOT_T unaryExpr -> ^( NOT unaryExpr)   
   ;

 unaryExpr
   : '('!  orExpr ')'! // -> ^( BRACE orExpr), brace not needed in the ast (but needed for propper parsing)
   |   notExpr
   |   internal^  // internal is very complex in itself
   ;

回答1:

You can do this directly in the parser. You do need to create some more parser rules as to not confuse ANTLR in the rewrite rules (see the inline comments):

grammar T;

options {
  output=AST;
  ASTLabelType=CommonTree;
}

start 
 : orExpr EOF! {System.out.println($orExpr.tree.toStringTree());}
 ;

orExpr
 : (andExpr2 -> andExpr2) ((OR andExpr)+ -> ^(OR andExpr2 andExpr+))?
 ;

// You can't use `andExpr` directly in the `orExpr` rule otherwise the rewrite
// rule `-> ^(OR ... )` gets confused.
andExpr2 : andExpr;

andExpr
 : (notExpr2 -> notExpr2) ((AND notExpr)+ -> ^(AND notExpr2 notExpr+))?
 ; 

notExpr2 : notExpr;

notExpr
 : NOT^ notExpr
 | atom  
 ;

atom
 : '(' orExpr ')' -> orExpr
 | ID
 ;

OR    : '||';
AND   : '&&';
NOT   : '!';
ID    : 'a'..'z'+;
SPACE : ' ' {skip();};

Parsing input like "a && b && c || d || f || g" will produce the following AST:

EDIT

The tree grammar would then look like this:

tree grammar TWalker;

options {
  tokenVocab=T;
  ASTLabelType=CommonTree;
}

start 
 : expr
 ;

expr
 : ^(OR expr+)
 | ^(AND expr+)
 | ^(NOT expr)
 | ID
 ;