Find all references of specific function declarati

2019-03-27 09:47发布

问题:

I am trying to find (line and column position) all the references of a specific function declaration when parsing a C++ source file via libclang in Python.

For example:

#include <iostream>
using namespace std;

int addition (int a, int b)
{
  int r;
  r=a+b;
  return r;
}

int main ()
{
  int z, q;
  z = addition (5,3);
  q = addition (5,5);
  cout << "The first result is " << z;
  cout << "The second result is " << q;
}

So, for the source file above, I would like for the function declaration for addition in line 5, I would like the find_all_function_decl_references(see below) to return the references of addition at lines 15 and 16.

I have tried this (adapted from here)

import clang.cindex
import ccsyspath

index = clang.cindex.Index.create()
translation_unit = index.parse(filename, args=args)

for node in translation_unit.cursor.walk_preorder():
    node_definition = node.get_definition()

    if node.location.file is None:
        continue
    if node.location.file.name != sourcefile:
        continue
    if node_def is None:
        pass
    if node.kind.name == 'FUNCTION_DECL':
        if node.kind.is_reference():
          find_all_function_decl_references(node_definition.displayname)  # TODO

Another approach could be to store all the function declarations found on a list and run the find_all_function_decl_references method on each.

Does anyone has any idea of how to approach this? How this find_all_function_decl_references method would be? (I am very new with libclang and Python.)

I have seen this where the def find_typerefs is finding all references to some type but I am not sure how to implement it for my needs.

Ideally, I would like to be able to fetch all references for any declaration; not only functions but variable declarations, parameter declarations (e.g. the a and b in the example above in line 7), class declarations etc.

EDIT Following Andrew's comment, here are some details regarding my setup specifications:

  • LLVM 3.8.0-win64
  • libclang-py3 3.8.1
  • Python3.5.1 (in Windows, I assume CPython)
  • For the args, I tried both the ones suggested in the answer here and the ones from another answer.

*Please note, given my small programming experience I could appreciate an answer with a brief explanation of how it works.

回答1:

The thing that really makes this problem challenging is the complexity of C++.

Consider what is callable in C++: functions, lambdas, the function call operator, member functions, template functions and member template functions. So in the case of just matching call expressions, you'd need to be able to disambiguate these cases.

Furthermore, libclang doesn't offer a perfect view of the clang AST (some nodes don't get exposed completely, particularly some nodes related to templates). Consequently, it's possible (even likely) that an arbitrary code fragment would contain some construct where libclangs view of the AST was insufficient to associate the call expression with a declaration.

However, if you're prepared to restrict yourself to a subset of the language it may be possible to make some headway - for example, the following sample tries to associate call sites with function declarations. It does this by doing a single pass over all the nodes in the AST matching function declarations with call expressions.

from clang.cindex import *

def is_function_call(funcdecl, c):
    """ Determine where a call-expression cursor refers to a particular function declaration
    """
    defn = c.get_definition()
    return (defn is not None) and (defn == funcdecl)

def fully_qualified(c):
    """ Retrieve a fully qualified function name (with namespaces)
    """
    res = c.spelling
    c = c.semantic_parent
    while c.kind != CursorKind.TRANSLATION_UNIT:
        res = c.spelling + '::' + res
        c = c.semantic_parent
    return res

def find_funcs_and_calls(tu):
    """ Retrieve lists of function declarations and call expressions in a translation unit
    """
    filename = tu.cursor.spelling
    calls = []
    funcs = []
    for c in tu.cursor.walk_preorder():
        if c.location.file is None:
            pass
        elif c.location.file.name != filename:
            pass
        elif c.kind == CursorKind.CALL_EXPR:
            calls.append(c)
        elif c.kind == CursorKind.FUNCTION_DECL:
            funcs.append(c)
    return funcs, calls

idx = Index.create()
args =  '-x c++ --std=c++11'.split()
tu = idx.parse('tmp.cpp', args=args)
funcs, calls = find_funcs_and_calls(tu)
for f in funcs:
    print(fully_qualified(f), f.location)
    for c in calls:
        if is_function_call(f, c):
            print('-', c)
    print()

To show how well this works, you need a slightly more challenging example to parse:

// tmp.cpp
#include <iostream>
using namespace std;

namespace impl {
    int addition(int x, int y) {
        return x + y;
    }

    void f() {
        addition(2, 3);
    }
}

int addition (int a, int b) {
  int r;
  r=a+b;
  return r;
}

int main () {
  int z, q;
  z = addition (5,3);
  q = addition (5,5);
  cout << "The first result is " << z;
  cout << "The second result is " << q;
}

And I get the output:

impl::addition
- <SourceLocation file 'tmp.cpp', line 10, column 9>

impl::f

addition
- <SourceLocation file 'tmp.cpp', line 22, column 7>
- <SourceLocation file 'tmp.cpp', line 23, column 7>

main

Scaling this up to consider more types of declarations would (IMO) be non-trivial and an interesting project in it's own right.

Addressing comments

Given that there are some questions about whether the code in this answer produces the results I've provided, I've added a gist of the code (that reproduces the content of this question) and a very minimal vagrant machine image that you can use to experiment with. Once the machine is booted you can clone the gist, and reproduce the answer with the commands:

git clone https://gist.github.com/AndrewWalker/daa2af23f34fe9a6acc2de579ec45535 find-func-decl-refs
cd find-func-decl-refs
export LD_LIBRARY_PATH=/usr/lib/llvm-3.8/lib/ && python3 main.py