Requirement:
I am trying to develop a language application using antlr4. The language in question is not important. The important thing is that the grammar is very vast (easily >2000 rules!!!). I want to do a number of operations
- Extract bunch of informations. These can be call graphs, variable names. constant expressions etc.
- Any number of transformations:
- if a loop can be expanded, we go ahead and expand it
- If we can eliminate dead code we might choose to do that
- we might choose to rename all variable names to conform to some norms.
Each of these operations can be applied independent of each other. And after application of these steps I want the rewrite the input as close as possible to the original input.
e.g. So we might want to eliminate loops and rename the variable and then output the result in the original language format.
Questions:
- I see a need to build a custom Tree (read AST) for this. So that I can modify the tree with each of the transformations. However when I want to generate the output, I lose the nice abilities of the TokenStreamRewriter. I have to specify how to write each of the nodes of the tree and I lose the original input formatting for the places I didn't do any transformations. Does antlr4 provide a good way to get around this problem?
- Is AST the best way to go? Or do I build my own object representation? If so how do I create that object efficiently? Creating object representation is very big pain for such a vast language. But may be better in the long run. Again how do I get back the original formatting?
- Is it possible to work just on the parse tree?
- Are there similar language applications which do the same thing? If so what strategy do they use?
Any input is welcome. Thanks in advance.