I'm generating AST using clang. I've got following file (lambda.cpp) to parse:
#include <iostream>
void my_lambda()
{
auto lambda = [](auto x, auto y) {return x + y;};
std::cout << "fabricati diem";
}
I'm parsing this using following command:
clang -Xclang -ast-dump -fsyntax-only lambda.cpp
The problem is that clang parses also headers content. As a result, I've got quite big (~3000 lines) file with useless (for me) content.
How to exclude headers when generating AST?
clang-check
might be useful on the matter, clang-check
has option -ast-dump-filter=<string>
documented as follow
-ast-dump-filter=<string> - Use with -ast-dump or -ast-print to dump/print only AST declaration nodes having a certain substring in a
qualified name. Use -ast-list to list all filterable declaration node
names.
when clang-check
run with -ast-dump-filter=my_lambda
on the sample code (lambda.cpp)
#include <iostream>
void my_lambda()
{
auto lambda = [](auto x, auto y) {return x + y;};
std::cout << "fabricati diem";
}
It dumps only matched declaration node FunctionDecl my_lambda 'void (void)'
Here is the command line arguments and few lines from output.
$ clang-check -extra-arg=-std=c++1y -ast-dump -ast-dump-filter=my_lambda lambda.cpp --
FunctionDecl 0x2ddf630 <lambda.cpp:3:1, line:7:1> line:3:6 my_lambda 'void (void)'
`-CompoundStmt 0x2de1558 <line:4:1, line:7:1>
|-DeclStmt 0x2de0960 <line:5:9, col:57>
Filtering on a specific identifier is fine, using -ast-dump-filter
. But what if you want ast from all identifiers in one file?
I came up with the following solution:
Add one recognizable line after the includes:
#include <iostream>
int XX_MARKER_XX = 123234; // marker line for ast-dump
void my_lambda()
...
Then dump the ast with
clang-check -extra-arg=-std=c++1y -ast-dump lambda.cpp > ast.txt
You can easily cut all stuff before XX_MARKER_XX
away with sed
:
cat ast.txt | sed -n '/XX_MARKER_XX/,$p' | less
Still a lot, but much more useful with bigger files.
This is a problem with C++ not with clang: there are no files in C++, there's just the compilation unit. When you #include
a file you include all definitions in said file (recursively) into your compilation unit and there's no way to differentiate them (it's what the standard expects your compiler to do).
Imagine a different scenario:
/////////////////////////////
// headertmp.h
#if defined(A)
struct Foo {
int bar;
};
#elif defined(B)
struct Foo {
short bar;
};
#endif
/////////////////////////////
// foobar.cpp
#ifndef A
# define B
#endif
#include "headertmp.h"
void foobar(Foo foo) {
// do stuff to foo.bar
}
Your foobar.cpp declares a struct called Foo
and a function called foobar
but headertmp.h
itself doesn't define any Foo
unless A
or B
are defined. Only in the compilation unit of foobar where the two come together can you make sense of headertmp.h
.
If you are interested in a subset of the declarations inside a compilation unit, you will have to extract the necessary information from the generated AST directly (similar to what a linker has to do when linking together different compilation units). Of course you can then filter the AST of this compilation unit on any metadata your parser extracts.