Antlr AST generating (possible) madness

2019-08-06 00:32发布

问题:

Is the following even possible? I want to "reverse" the input given to antlr and make each token a child of the previous one.

So, for the input (Assume each token is separated by the '.' char) :

Stack.Overflow.Horse

I would like my grammar to produce the following AST:

Horse
  |---Overflow
         |---Stack

So far, I've managed to reverse the nodes, but I'm unable to make them children of each other:

function
 : ID PERIOD function
   -> function ID
 | ID
 ;

ID  : 'a'..'z'*
    ;

回答1:

I don't think there's an easy way to do that. You could make your rule like this:

function
 : ID PERIOD function
   -> ^(function ID)
 | ID
 ;

but that only makes the last node the root and all other nodes its children. For example, the following source:

a.b.c.d.e

will result in the following tree:

    e
 / / \ \
d c   b a

I can't see an easy fix since when you first parse a.b.c.d.e, a will be the ID and b.c.d.e the recursive call to function:

a.b.c.d.e
| +-----+
|    |
|    `-----> function
|
`----------> ID

resulting in the fact that b.c.d.e will have a as its child. When then b becomes the ID, it too is added as a child next to a. In your case, a should be removed as a child and then added to the list of b's children. But AFAIK, that is not possible in ANLTR (at least, not in a clean way inside the grammar).


EDIT

Okay, as a work-around I had something elegant in mind, but that didn't work as I had hoped. So, as a less elegant solution, you could match the last node as the root in your rewrite rule:

function
  :  (id '.')* last=id -> ^($last)
  ;

and then collect all possible preceding nodes (children) in a List using the += operator:

function
  :  (children+=id '.')* last=id -> ^($last)
  ;

and use a custom member-method in the parser to "inject" these children into the root of your tree (going from right to left in your List!):

function
  :  (children+=id '.')* last=id {reverse($children, (CommonTree)$last.tree);} -> ^($last)
  ;

A little demo:

grammar ReverseTree;

options {
  output=AST;
}

tokens {
  ROOT;
}

@members {
  private void reverse(List nodes, CommonTree root) {
    if(nodes == null) return;
    for(int i = nodes.size()-1; i >= 0; i--) {
      CommonTree temp = (CommonTree)nodes.get(i);
      root.addChild(temp);
      root = temp;
    }
  }
}

parse
  :  function+ EOF -> ^(ROOT function+)
  ;

function
  :  (children+=id '.')* last=id {reverse($children, (CommonTree)$last.tree);} -> ^($last)
  ;

id 
  :  ID
  ;

ID
  :  ('a'..'z' | 'A'..'Z')+
  ;

Space
  :  ' ' {skip();}
  ;

And a little test class:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
    public static void main(String[] args) throws Exception {
        ANTLRStringStream in = new ANTLRStringStream("a.b.c.d.e    Stack.Overflow.Horse    singleNode");
        ReverseTreeLexer lexer = new ReverseTreeLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        ReverseTreeParser parser = new ReverseTreeParser(tokens);
        ReverseTreeParser.parse_return returnValue = parser.parse();
        CommonTree tree = (CommonTree)returnValue.getTree();
        DOTTreeGenerator gen = new DOTTreeGenerator();
        StringTemplate st = gen.toDOT(tree);
        System.out.println(st);
    }
}

which will produce an AST that looks like:

  • (image generated using http://graph.gafol.net)

For the input string:

"a.b.c.d.e    Stack.Overflow.Horse    singleNode"