Callgrind: Profile a specific part of my code

2019-02-02 10:12发布

问题:

I'm trying to profile (with Callgrind) a specific part of my code by removing noise and computation that I don't care about. Here is an example of what I want to do:

for (int i=0; i<maxSample; ++i) {
    //Prepare data to be processed...
    //Method to be profiled with these data
    //Post operation on the data
}

My use-case is a regression test, I want to make sure that the method in question is still fast enough (something like less than 10% extra instructions since the last implementation). This is why I'd like to have the cleaner output form Callgrind. (I need a for loop in order to have a significant amount of data processed in order to have a good estimation of the behavior of the method I want to profile)

My first try was to change the code to:

for (int i=0; i<maxSample; ++i) {
    //Prepare data to be processed...
    CALLGRIND_START_INSTRUMENTATION;
    //Method to be profiled with these data
    CALLGRIND_STOP_INSTRUMENTATION;
    //Post operation on the data
}
CALLGRIND_DUMP_STATS;

Adding the Callgrind macros to control the instrumentation. I also added the --instr-atstart=no options to be sure that I profile only the part of the code I want...

Unfortunately with this configuration when I start to launch my executable with callgrind, it never ends... It is not a question of slowness, because a full instrumentation run last less than one minute.

I also tried

for (int i=0; i<maxSample; ++i) {
    //Prepare data to be processed...
    CALLGRIND_TOGGLE_COLLECT;
    //Method to be profiled with these data
    CALLGRIND_TOGGLE_COLLECT;
    //Post operation on the data
}
CALLGRIND_DUMP_STATS;

(or the --toggle-collect="myMethod" option) But Callgrind returned me a log without any call (KCachegrind is white as snow :( and says zero instructions...)

Did I use the macros/options correctly? Any idea of what I need to change in order to get the expected result?

回答1:

I finally managed to solve this issue... This was a config issue:

I kept the code

for (int i=0; i<maxSample; ++i) {
    //Prepare data to be processed...
    CALLGRIND_TOGGLE_COLLECT;
    //Method to be profiled with these data
    CALLGRIND_TOGGLE_COLLECT;
    //Post operation on the data
}
CALLGRIND_DUMP_STATS;

But ran the callgrind with --collect-atstart=no (and without the --instr-atstart=no!!!) and it worked perfectly, in a reasonable time (~1min).

The issue with START/STOP instrumentation was that callgrind dumps a file (callgrind.out.#number) at each iteration (each STOP) thus it was really really slow... (after 5min I had only 5000 runs for a 300 000 iterations benchmark... unsuitable for a regression test).



回答2:

The toggle-collect option is very picky in how you specify the method to use as trigger. You actually need to specify its argument list as well, and even the whitespace needs to match! Use the method name exactly as it appears in the callgrind output. For instance, I am using this invokation:

$ valgrind 
    --tool=callgrind 
    --collect-atstart=no 
    "--toggle-collect=ctrl_simulate(float, int)"
    ./swaag

Please observe:

  • The double quotes around the option.
  • The argument list including parentheses.
  • The whitespace after the comma character.