Use ANTLR to parse C++ with C#

2020-02-13 03:33发布

问题:

I'm trying to use ANTLR to get a C++ AST, if possible from my C# code base.

Now, the basic workflow seems clear to me: Generate .cs lexer and parser using ANTLRWorks, add them and the ANTLR-references to a C# project, give it a C++ source, work with resulting data structures.

However, I'm already failing at the second step. I downloaded C++ grammars from http://www.antlr.org/grammar/list (I tried "C++ grammar " by Aurelian Melinte and "C++ grammar and code tracer for ANTLR 3.2" by Ramin Zaghi) and generated the lexer and parser for C# by setting "language = CSharp3;" in the grammar's options. However, I can't get to compile the C# project containing the parser and lexer files.

A problem is that I have no idea whether this is a problem of the grammar that I use or of the versions that are available... There are so many different versions of ANTLR, of the C# runtimes and of the C# Targets that attempting to try every combination seems to be a rather hopeless task.

However, the current combination seems to work fine, a small example grammar comes out with just one error ("HIDDEN" in the c# lexer needs to be changed to "Hidden" and that's it), but the C++ parser/lexer still gives me lots of compiler errors, mostly dealing with preprocessor directives and array declarations.

Did anyone ever manage to parse C++ with the ANTLR-generated C# files? Does anyone have any idea how this is supposed to work?

回答1:

The problem is that there is embedded code in both grammars, and that code is written in C++. Embedded code is very common in complex grammars, so you need to find a grammar for parsing C++ in C#, as opposed to just parsing C++. As a side note, if you are able to find one that parses C++ in Java, you can use IKVM to use it from C#.



回答2:

The only ANTLR grammar I ever saw for C++ was abandoned by its author as being incomplete, and he was only trying for C++98 (YMMV). C++11 (and yea, verily, C++14) is here and much more complex. Building a production C++ is really hard, and unless you can get one that has been tested by fire, it probably doesn't work on real code.

I suggest you use Clang, the EDG C++ front end, or our DMS Software Reengineering Toolkit, all of which have robust C++ parsers. If you want to manipulate the parsed C++ for some purpose, you will want more machinery than a "mere" parser.