I'm building tool for testing ansi c applications. Simply load code, view control flow graph, run test, mark all vertexes which was hit. I'm trying to build CFG all by myself from parsing code. Unfortunately It gets messed up if code is nested. GCC gives ability to get CFG from compiled code. I might write parser for its output, but I need line numbers for setting breakpoints. Is there way for getting line numbers when outputting Control Flow Graph with -fdump-tree-cfg
or -fdump-tree-vcg
?
问题:
回答1:
For the control flow graph of a C Program you could look at existing Python parsers for C:
- PyCParser
- pycparser
- pyclibrary (fork of pyclibrary )
- joern
- CoFlo C/C++ control flow graph generator and analyzer
Call graphs are a closely related construct to control flow graphs. There are several approaches available to create call graphs (function dependencies) for C code. This might prove of help for progressing with control flow graph generation. Ways to create dependency graphs in C:
Using cflow:
- cflow +pycflow2dot +dot (GPL, BSD) cflow is robust, because it can handle code which cannot compile, e.g. missing includes. If preprocessor directives are heavily used, it may need the
--cpp
option to preprocess the code. - cflow + cflow2dot + dot (GPL v2, GPL v3, Eclipse Public License (EPL) v1) (note that cflow2dot needs some path fixing before it works)
- cflow +cflow2dot.bash (GPL v2, ?)
- cflow +cflow2vcg (GPL v2 , GPL v2)
- enhanced cflow (GPL v2) with list to exclude symbols from graph
- cflow +pycflow2dot +dot (GPL, BSD) cflow is robust, because it can handle code which cannot compile, e.g. missing includes. If preprocessor directives are heavily used, it may need the
Using cscope:
- cscope (BSD)
- cscope +callgraphviz +dot +xdot
- cscope +vim CCTree (C Call-Tree Explorer)
- cscope +ccglue
- cscope +CodeQuery for C, C++, Python & Java
- cscope +Python html producer
- cscope +calltree.sh
ncc (cflow like)
- KCachegrind (KDE dependency viewer)
- Calltree
The following tools unfortunately require that the code be compilable, because they depend on output from gcc:
- CodeViz (GPL v2) (weak point: needs compilable source, because it uses gcc to dump cdepn files)
- gcc +egypt +dot (GPL v*, Perl = GPL | Artistic license, EPL v1) (
egypt
usesgcc
to produceRTL
, so fails for any buggy source code, or even in case you just want to focus on a single file from a larger project. Therefore, it is not very useful compared to the more robustcflow
-based toolchains. Note that egypt has by default good support for excluding library calls from the graph, to make it cleaner.
Also, file dependency graphs for C/C++ can be created with crowfood
.
回答2:
So I've made some more research and it is not hard to get line numbers for nodes. Just add lineno
option to one of those options to get it. So use -fdump-tree-cfg-lineno
or -fdump-tree-vcg-lineno
. It took me some time to check if those numbers are reliable. In case of graph in VCG format label of each node contains two numbers. Those are line numbers for start and end of code portion represented by this node.
回答3:
Dynamic analysis methods
In this answer I describe a few dynamic analysis methods.
Dynamic methods actually run the program to determine the call graph.
The opposite of dynamic methods are static methods, which try to determine it from the source alone without running the program.
Advantages of dynamic methods:
- catches function pointers and virtual C++ calls. These are present in large numbers in any non-trivial software.
Disadvantages of dynamic methods:
- you have to run the program, which might be slow, or require a setup that you don't have, e.g. cross-compilation
- only functions that were actually called will show. E.g., some functions could be called or not depending on the command line arguments.
KcacheGrind
https://kcachegrind.github.io/html/Home.html
Test program:
int f2(int i) { return i + 2; }
int f1(int i) { return f2(2) + i + 1; }
int f0(int i) { return f1(1) + f2(2); }
int pointed(int i) { return i; }
int not_called(int i) { return 0; }
int main(int argc, char **argv) {
int (*f)(int);
f0(1);
f1(1);
f = pointed;
if (argc == 1)
f(1);
if (argc == 2)
not_called(1);
return 0;
}
Usage:
sudo apt-get install -y kcachegrind valgrind
# Compile the program as usual, no special flags.
gcc -ggdb3 -O0 -o main -std=c99 main.c
# Generate a callgrind.out.<PID> file.
valgrind --tool=callgrind ./main
# Open a GUI tool to visualize callgrind data.
kcachegrind callgrind.out.1234
You are now left inside an awesome GUI program that contains a lot of interesting performance data.
On the bottom right, select the "Call graph" tab. This shows an interactive call graph that correlates to performance metrics in other windows as you click the functions.
To export the graph, right click it and select "Export Graph". The exported PNG looks like this:
From that we can see that:
- the root node is
_start
, which is the actual ELF entry point, and contains glibc initialization boilerplate f0
,f1
andf2
are called as expected from one anotherpointed
is also shown, even though we called it with a function pointer. It might not have been called if we had passed a command line argument.not_called
is not shown because it didn't get called in the run, because we didn't pass an extra command line argument.
The cool thing about valgrind
is that it does not require any special compilation options.
Therefore, you could use it even if you don't have the source code, only the executable.
valgrind
manages to do that by running your code through a lightweight "virtual machine".
Tested on Ubuntu 18.04.
gcc -finstrument-functions
+ etrace
https://github.com/elcritch/etrace
-finstrument-functions
adds callbacks, etrace parses the ELF file and implements all callbacks.
I couldn't get it working however unfortunately: Why doesn't `-finstrument-functions` work for me?
Claimed output is of format:
\-- main
| \-- Crumble_make_apple_crumble
| | \-- Crumble_buy_stuff
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | | \-- Crumble_buy
| | \-- Crumble_prepare_apples
| | | \-- Crumble_skin_and_dice
| | \-- Crumble_mix
| | \-- Crumble_finalize
| | | \-- Crumble_put
| | | \-- Crumble_put
| | \-- Crumble_cook
| | | \-- Crumble_put
| | | \-- Crumble_bake
Likely the most efficient method besides specific hardware tracing support, but has the downside that you have to recompile the code.