ANTLR4 parse tree simplification

2019-02-15 14:05发布

Is there any means to get ANTLR4 to automatically remove redundant nodes in generated parse trees?

More specifically, I've been experimenting with a grammar for GLSL and you end up with long linear sequences of "expressions" in the parse tree due to the rule forwarding needed to give the automatic handling of operator precedence.

Most of the generated tree nodes are simply "forward to the next level of precedence", so don't provide any useful syntactic information - you only really need the last expression node in each sequence (i.e. the point at which the rule forwarding stopped), or the point where it becomes an actual tree node with more than one child (i.e. an actual expression was encountered in the source) ...

I was hoping there would be an easy way to kill off the dummy intermediate expression nodes - this type of structure must be common in any grammar with operator precedence.

The basic structure of the grammar is a fairly direct clone taken from the Khronos specification for the language:

https://www.khronos.org/registry/gles/specs/3.1/es_spec_3.1.pdf

标签: antlr antlr4
2条回答
小情绪 Triste *
2楼-- · 2019-02-15 14:39

Use the Visitor implementation to access each node in sequence. Build your own tree by adding nodes to parents as they are visited. Decide at the time the node is visited whether to add it to your new tree or not. For example:

public T visitExpression(@NotNull AcParser.ExpressionContext ctx) {
        // Expressionable parent = getParent(Expressionable.class, ctx);
        // Class<? extends AcExpression> expClass = AcExpression.class;
        AcExpression obj = null;
        String text = ctx.getText();

        //do something with text or children
        for (int i=0; i<ctx.getChildCount(); i++){
            printnl(ctx.getChild(i).getText()+"/");
        }

        return visitChildren(ctx);
    }
查看更多
来,给爷笑一个
3楼-- · 2019-02-15 14:47

ANTLR v4 is able to generate code from a single recursive rule dealing with different precedence levels, if you use a grammar like this (example for basic math):

expr : '(' expr ')'
     | '-' expr
     | expr ('*'|'/') expr
     | expr ('+'|'-') expr
     | INT
     ;

ANTLR v3 was unable to do so and basically required you to write one rule per precedence level. So I'd advise you to rewrite your grammar to avoid these boilerplate rules.

Then, I think you're confusing the parse tree (aka concrete syntax tree) with the AST (abstract syntax tree). The AST is like a simplified version of the parse tree, which keeps only what's needed for your purpose. For instance, with the expr rule above, the AST wouldn't contain any node for parentheses, since the precedence is encoded in the tree itself and you usually don't need to know whether a part of a given expression was parenthesized or not.

Your program should build an AST from the parse tree and then go from there. Don't deal with parse trees directly, even if it seems convenient at first sight because the tool generates them for you. It'll quickly become cumbersome. Build your own tree structure (AST), tailored for the task at hand.

查看更多
登录 后发表回答