How to merge two ASTs?

I'm trying to implement a tool for merging different versions of some source code. Given two versions of the same source code, the idea would be to parse them, generate the respective Abstract Source Trees (AST), and finally merge them into a single output source keeping grammatical consistency - the lexer and parser are those of question ANTLR: How to skip multiline comments.

I know there is class ParserRuleReturnScope that helps... but getStop() and getStart() always return null :-(

Here is a snippet that illustrates how I modified my perser to get rules printed:

parser grammar CodeTableParser;

options {
    tokenVocab = CodeTableLexer;
    backtrack = true;
    output = AST;
}

@header {
    package ch.bsource.ice.parsers;
}

@members {
    private void log(ParserRuleReturnScope rule) {
        System.out.println("Rule: " + rule.getClass().getName());
        System.out.println("    getStart(): " + rule.getStart());
        System.out.println("    getStop(): " + rule.getStop());
        System.out.println("    getTree(): " + rule.getTree());
    }
}

parse
    : codeTabHeader codeTable endCodeTable eof { log(retval); }
    ;

codeTabHeader
    : comment CodeTabHeader^ { log(retval); }
    ;

...

标签： java parsing antlr abstract-syntax-tree lexer

2条回答

地球回转人心会变

2楼-- · 2019-04-29 15:48

Assuming you have the ASTs (often difficult to get in the first place, parsing real languages is often harder than it looks), you first have to determine what they have in common, and build a mapping collecting that information. That's not as easy as it looks; do you count a block of code that has moved, but is the same exact subtree, as "common"? What about two subtrees that are the same except for consistent renaming of an identifier? What about changed comments? (most ASTs lose the comments; most programmers will think this is a really bad idea).

You can build a variation of the "Longest Common Substring" algorithm to compare trees. I've used that in tools that I have built.

Finally, after you've merged the trees, now you need to regenerate the text, ideally preserving most of the layout of the original code. (Programmers hate when you change the layout they so loving produced). So your ASTs need to capture position information, and your regeneration has to honor that where it can.

0人赞添加讨论(0) 举报

smile是对你的礼貌

3楼-- · 2019-04-29 15:53

The call to log(retval) in your parser code looks like it's going to happen at the end of the rule, but it's not. You'll want to move the call into an @after block.

I changed log to spit out a message as well as the scope information and added calls to it to my own grammar like so:

script    
    @init {log("@init", retval);}
    @after {log("@after", retval);}
    : statement* EOF  {log("after last rule reference", retval);} 
        -> ^(STMTS statement*) 
    ;

Parsing test input produced the following output:

Logging from @init
    getStart(): [@0,0:4='Print',<10>,1:0]
    getStop(): null
    getTree(): null
Logging from after last rule reference
    getStart(): [@0,0:4='Print',<10>,1:0]
    getStop(): null
    getTree(): null
Logging from @after
    getStart(): [@0,0:4='Print',<10>,1:0]
    getStop(): [@4,15:15='<EOF>',<-1>,1:15]
    getTree(): STMTS

The call in the after block has both the stop and tree fields populated.

I can't say whether this will help you with your merging tool, but I think this will at least get you past the problem with the half-populated scope object.

0人赞添加讨论(0) 举报

How to merge two ASTs?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间