I need to go through a C/C++ file and extract the list of classes and methods and where they're located on the file.
Is libclang the best option? Or is it "too much" for the task?
Would it be better to just look for pairing brackets?
In case libclang is the choice: is there a way to invoke it from c#?
Thanks!
You could consider ctags, available on many platforms. The output is easily parsable, and full of info you required.
more info For your question, I had to look to the many options available, and after a little I found it. For example:
produces this output
(note: -x is only for easy user inspection)
To do this well, you really need something that contains a full C++ parser.
Our DMS Software Reengineering Toolkit with its C++ Front End could be used for this. It can provide both the precise entity declarations including types, and their context (class/namespace/...) and precise file positions. DMS provides access to all this inforamtion as a set of ASTs and related symbol tables; you build custom code to navigate to/take what you want.
Depending on your needs, you may find that the information you want is difficult to process using vanilla C#. The type information in its full glory is pretty complex, because C++ is a complex language. If you want to process that information, you'll want to "stay inside" DMS where all the machinery to do that is present. If all you want is the names and type information as text strings, you can get DMS to prettyprint this data in that form; it has standard libraries supporting such activities. An intermediate answer would be to export the data in XML format; DMS provides direct support for exporting arbitrary AST fragments but only indirect support for writing type information out as XML, but it wouldn't be hard to customize.
EDIT: (in response to OP comment in another answer) DMS can provide precise information both about the method signature, and the method body. It has full AST and type information for both.
Not sure what is the best option, but you could take a look at GCC-XML or Mono/CXXI as well. The latter one uses GCC-XML internally, but also provides C# interfaces to the C++ classes definitions.
libclang is a C library and thus should be usable from .NET via P/Invoke, but it might be quite tedious to repeat all necessary declarations in C#.
It's better to use a full parser IMO. You can use ANTLR. It has both C/C++ grammar and C# parser generator.
Another angle would be to create an extension for Visual Studio.
If you want to use Clang, I recommend you take a look at this page. It demonstrates how to get all virtual methods from a file. Once you understand this simple example, you can create more complex so called matchers.