How to parse C++ source in Python?

2020-07-04 07:07发布

We want to parse our huge C++ source tree to gain enough info to feed to another tool to make diagrams of class and object relations, discern the overall organization of things etc.

My best try so far is a Python script that scans all .cpp and .h files, runs regex searches to try to detect class declarations, methods, etc. We don't need a full-blown analyzer to capture every detail, or some heavy UML diagram generator - there's a lot of detail we'd like to ignore and we're inventing new types of diagrams. The script sorta works, but by gosh it's true: C++ is hard to parse!

So I wonder what tools exist for extracting the info we want from our sources? I'm not a language expert, and don't want something with a steep learning curve. Something we low-brow blue-collar programmer grunts can use :P

Python is preferred as one of the standard languages here, but it's not essential.

7条回答
欢心
2楼-- · 2020-07-04 07:41

If you can bring yourself to run this analysis using a Windows-platform application, save yourself a lot of time and trouble, and spend $200 on Enterprise Architect by Sparx Systems (I have no affiliation with this company, just a satisfied customer). (Note: this should not be confused with Microsoft's own "Enterprise Architect" bundle for Visual Studio.)

EA can reverse-engineer a number of languages, including C++, C, Java, and Python, generating some very nice UML class diagrams. (EA comes in a number of different packages, Desktop is the cheapest but you have to by Professional, the 2nd cheapest, to get the code engineering feature included.) I also like the integration between the generated class diagrams and sequence diagramming, where you can drag a line between object lifelines and a menu of defined methods is presented to you based on the class definition of the target object. At my former consulting business, we used this tool quite a bit to develop system architectural proposals which we then included as part of our project bid (just copy/paste the diagram into a Word doc). It wont take long to make back your $200.

查看更多
等我变得足够好
3楼-- · 2020-07-04 07:46

I'll simply recommend Clang.

It's a C++ library-based compiler designed with ease of reuse in mind. It notably means that you can use it solely for parsing and generating an Abstract Syntax Tree. It takes care of all the tedious operator overloading resolution, template instantiation and so on.

Clang exports a C-based interface, which is extended with Python Bindings. The interface is normally quite rich, but I haven't use it. Anyway, contributions are welcome if you wish to help extending it.

查看更多
神经病院院长
4楼-- · 2020-07-04 07:47

Can you run a preprocessing step? Doxygen parses most C++ syntax and creates xml with all the relationships. Compilers also create debug databases (typically dwarf format from gcc and codeview format from MSC).

查看更多
戒情不戒烟
5楼-- · 2020-07-04 07:49

You could check out GccXML and OpenC++, as well as doxygen.

查看更多
Melony?
6楼-- · 2020-07-04 07:51

From what you say of our requirements, Tony's answer of GccXML will probably be the best option. If that doesn't work, you could try to generate an outline of your program with cscope or ctags, and then work your way to the info you want from it's output.

查看更多
乱世女痞
7楼-- · 2020-07-04 07:55

I've had good experience with PLY:

http://www.dabeaz.com/ply/

But this requires some experience with lex and yacc

查看更多
登录 后发表回答