How can I find all member field read/writes using

2020-06-17 05:24发布

问题:

Given a C++ source code, I want to find the class fields that every function writes and reads. What is the best way of doing this using the Clang frontend?

(I'm not asking for a detailed explanation of all the steps; however a starting point for an efficient solution would be great.)

So far I tried parsing statements using the RecursiveASTVisitor, but keeping track of node connections is difficult. Also, I cannot figure out how to keep track of something like below:

int& x = m_int_field;
x++;

This clearly modifies m_int_field; but given a single Stmt it is impossible to know that; so AST traversal by itself seems insufficient.

A bonus for me is being able to separately count fields and sub-fields (e.g. Accessing three fields of a member struct).

Example:

typedef struct Y {
    int m_structfield1;
    float m_structfield2;
    Y () {
        m_structfield1 = 0;
        m_structfield2 = 1.0f;
    }
} Y;
class X {
    int m_field1;
    std::string m_field2;
    Y m_field3;
public:
    X () : m_field2("lel") {}
    virtual ~X() {}
    void func1 (std::string s) {
        m_field1 += 2;
        m_field2 = s;
    }
    int func2 () {
        return m_field1 + 5;
    }
    void func3 (Y& y) {
        int& i = m_field1;
        y.m_structfield2 = 1.2f + i++;
    }
    int func4 () {
        func3 (m_field3);
        return m_field3.m_structfield1;
    }
};

should return

X::X() -> m_field1 (w), m_field3.m_structfield1 (w), m_field3.m_structfield2 (w)
X::func1(std::string) -> m_field1 (r+w), m_field2 (w)
X::func2() -> m_field1 (r)
X::func3(Y&) -> m_field1 (r+w)
X::func4() -> m_field1 (r+w), m_field3.m_structfield2 (w), m_field3.m_structfield1 (r)

We can assume for simplicity that there is no inheritance.

回答1:

I've been collecting some examples of analyzing code with Clang's AST matchers. There is an example application there, StructFieldUser, that reports which fields of a struct get read or written, and the function in which each access happens. It's different than what you're looking for, but it might be a useful point of reference. It demonstrates extracting and recording this kind of information, and it illustrates how to put all the pieces together.

A good place to start with AST matchers in general is this post by Eli Bendersky.

To get a feel for the matchers that would solve your problem, you might practice with clang-query:

$ clang-query example.cpp --    # the two dashes mean no compilation db
clang-query> let m1 memberExpr()
clang-query> m m1

Match #1:

/path/example.cpp:9:9: note: "root" binds here
        m_structfield1 = 0;
        ^~~~~~~~~~~~~~

Match #2:

/path/example.cpp:10:9: note: "root" binds here
        m_structfield2 = 1.0f;
        ^~~~~~~~~~~~~~
...
11 matches.

Then you can start to connect to other nodes using traversal matchers. This lets you capture related context, like the function or class method in which the reference is made. Adding bind expressions to the node matchers will help you see exactly what is getting matched. Binding nodes will also give access to the nodes in callbacks.

clang-query> let m2 memberExpr(hasAncestor(functionDecl().bind("fdecl"))).bind("mexpr")
clang-query> m m2

Match #1:

/path/example.cpp/path/example.cpp:8:5: note: "fdecl" binds here
    Y () {
    ^~~~~~
/path/example.cpp:9:9: note: "mexpr" binds here
        m_structfield1 = 0;
        ^~~~~~~~~~~~~~
/path/example.cpp:9:9: note: "root" binds here
        m_structfield1 = 0;
        ^~~~~~~~~~~~~~

Match #2:

/path/example.cpp:8:5: note: "fdecl" binds here
    Y () {
    ^~~~~~
/path/example.cpp:10:9: note: "mexpr" binds here
        m_structfield2 = 1.0f;
        ^~~~~~~~~~~~~~
/path/example.cpp:10:9: note: "root" binds here
        m_structfield2 = 1.0f;
        ^~~~~~~~~~~~~~
...

It can take some work to learn how to pick up the exact nodes you need. Note that the matchers above don't pick up the initialization in X::X(). Looking at the AST from

clang-check -ast-dump example.cpp -- 

shows that those nodes are not MemberExpr nodes; they're CXXCtorInitializer nodes, so the cxxCtorInitializer matcher is needed to get those nodes. Multiple matchers are probably needed to find all the different nodes.