可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm looking for a way to parse c++ code to retrieve some basic information about classes. I don't actually need much information from the code itself, but I do need it to handle things like macros and templates. In short, I want to extract the "structure" of the code, what you would show in a UML diagram.

For each class/struct/union/enum/typedef in the code base, all I need (after templates & macros have been handled) is:

Their name
The namespace in which they live
The fields contained within (name of type, name of field and access restrictions, such as private/mutable/etc)
Functions contained within (return type, name, parameters)
The declaring file
Line/column numbers (or byte offset in file) where the definition of this data begins

The actual instructions in the code are irrelevant for my purposes.

I'm anticipating a lot of people saying I should just use a regex for this (or even Flex & Bison), but these aren't really valid, as I do need the preprocessor and template stuff handled properly.

回答1:

Sounds like a job for gcc-xml in combination with the c++ xml-library or xml-friendly scripting language of your choice.

回答2:

Elsa: The Elkhound-based C/C++ Parser,
clang: a C language family frontend for LLVM/Clang Static Analyzer,
ANTLR Parser Generator Grammar List (search for C++, there is more than one grammar),
OpenC++ (adds reflection capabilities to C++),
Stratego XT (full programs transformation - see CodeBoost, which for parsing uses OpenC++ just mentioned, for an example application to C++ programs),
Parsing C++ at nobugs.org (not a parser but interesting bits of information; in particular Edward D. Willink's "Meta-Compilation for C++" PhD thesis and Mike Dimmick overview of his attempt to parse C++).

See also Ira Baxter here, where he cites his own product.

Warning: mind you, only Elsa "..I hear does a fairly good job.." at constructing a symbol table, which according to Ira Baxter is necessary for OP's original intent (see comments to this answer - I quote him because he is an expert in the field).

回答3:

Running Doxygen on the code would give you most of that, wouldn't it?

In what format do you want the output?

回答4:

Exuberant Ctags will give you most of what you need, it's usually used by editors to provide code navigation.
May choke on some templates though...

回答5:

The DMS Software Reengineering Toolkit is general purpose program analysis and transformation machinery. Its C++ Front End builds on DMS to provide full featured C++ parsing for a variety of common C++ dialects, can process set of C++ classes simulataneously, and constructs full name/type/access information that you can use any way you want. Information is tagged as to precise origin file/line/column. (It includes a full preprocessor).

You are right; regex can't even come close to this.

回答6:

You can easily get macros expanded by just running pre-processor (cpp) on the source. The templates are not that easy since template instantiation happens much later.

回答7:

Doxygen can also produce a detailed XML by setting an option in the configuration file. It is quite thorough, and very easy to use. From the doxygen home page:

The XML output consists of a structured "dump" of the information gathered by doxygen. Each compound (class/namespace/file/...) has its own XML file and there is also an index file called index.xml.

A file called combine.xslt XSLT script is also generated and can be used to combine all XML files into a single file.

Doxygen also generates two XML schema files index.xsd (for the index file) and compound.xsd (for the compound files). This schema file describes the possible elements, their attributes and how they are structured, i.e. it the describes the grammar of the XML files and can be used for validation or to steer XSLT scripts.

In the addon/doxmlparser directory you can find a parser library for reading the XML output produced by doxygen in an incremental way (see addon/doxmlparser/include/doxmlintf.h for the interface of the library)