How to exclude headers from AST in clang?

2019-03-15 00:07发布

问题:

I'm generating AST using clang. I've got following file (lambda.cpp) to parse:

#include <iostream>

void my_lambda()
{
    auto lambda = [](auto x, auto y) {return x + y;};
    std::cout << "fabricati diem"; 
}

I'm parsing this using following command:

clang -Xclang -ast-dump -fsyntax-only lambda.cpp

The problem is that clang parses also headers content. As a result, I've got quite big (~3000 lines) file with useless (for me) content.

How to exclude headers when generating AST?

回答1:

clang-check might be useful on the matter, clang-check has option -ast-dump-filter=<string> documented as follow

-ast-dump-filter=<string> - Use with -ast-dump or -ast-print to dump/print only AST declaration nodes having a certain substring in a qualified name. Use -ast-list to list all filterable declaration node names.

when clang-check run with -ast-dump-filter=my_lambda on the sample code (lambda.cpp)

#include <iostream>

void my_lambda()
{
    auto lambda = [](auto x, auto y) {return x + y;};
    std::cout << "fabricati diem"; 
}

It dumps only matched declaration node FunctionDecl my_lambda 'void (void)'

Here is the command line arguments and few lines from output.

$ clang-check -extra-arg=-std=c++1y -ast-dump -ast-dump-filter=my_lambda lambda.cpp --

FunctionDecl 0x2ddf630 <lambda.cpp:3:1, line:7:1> line:3:6 my_lambda 'void (void)'
`-CompoundStmt 0x2de1558 <line:4:1, line:7:1>
  |-DeclStmt 0x2de0960 <line:5:9, col:57>


回答2:

Filtering on a specific identifier is fine, using -ast-dump-filter. But what if you want ast from all identifiers in one file?

I came up with the following solution:

Add one recognizable line after the includes:

#include <iostream>
int XX_MARKER_XX = 123234; // marker line for ast-dump
void my_lambda()
...

Then dump the ast with

clang-check -extra-arg=-std=c++1y -ast-dump lambda.cpp > ast.txt

You can easily cut all stuff before XX_MARKER_XX away with sed:

cat ast.txt | sed -n '/XX_MARKER_XX/,$p'  | less

Still a lot, but much more useful with bigger files.



回答3:

This is a problem with C++ not with clang: there are no files in C++, there's just the compilation unit. When you #include a file you include all definitions in said file (recursively) into your compilation unit and there's no way to differentiate them (it's what the standard expects your compiler to do).

Imagine a different scenario:

/////////////////////////////
// headertmp.h
#if defined(A)
    struct Foo {
        int bar;
    };
#elif defined(B)
    struct Foo {
        short bar;
    };
#endif

/////////////////////////////
// foobar.cpp
#ifndef A
# define B
#endif

#include "headertmp.h"

void foobar(Foo foo) {
    // do stuff to foo.bar
}

Your foobar.cpp declares a struct called Foo and a function called foobar but headertmp.h itself doesn't define any Foo unless A or B are defined. Only in the compilation unit of foobar where the two come together can you make sense of headertmp.h.

If you are interested in a subset of the declarations inside a compilation unit, you will have to extract the necessary information from the generated AST directly (similar to what a linker has to do when linking together different compilation units). Of course you can then filter the AST of this compilation unit on any metadata your parser extracts.