I'd like to statistically profile my C code at the instruction level.
I need to know how many additions, multiplications, divisions, etc I'm performing.
This is not your usual run of the mill code profiling requirement. I'm an algorithm developer and I want to estimate the cost of converting my code to hardware implementations. For this, I'm being asked the instruction call breakdown during run-time (parsing the compiled assembly isn't sufficient as it doesn't consider loops in the code).
After looking around, it seems VMware may offer a possible solution, but I still couldn't find the specific feature that will allow me to trace the instruction call stream of my process.
Are you aware of any profiling tools which enable this?
I eventually used a trivial yet effective solution.
- Configured GDB to display the disassembly of the next instruction (every time it stops) by invoking:
display/i $pc
Configured a simple gdb script that breaks in the function I need to analyze and proceeds to step instruction by instruction:
set $i=0
break main
run
while ($i<100000)
si
set $i = $i + 1
end
quit
Executed gdb with my script dumping output into a log file:
gdb -x script a.out > log.txt
Analyzed the log to count specific instruction calls.
Crude, but it works...
You can use pin-instat which is a PIN tool. It's a bit over kill as it records more information than the instruction count. It still should be more efficient than your gdb approach through.
Disclaimer: I'm the author of pin-instat.
The Linux tool perf
will give you a good deal of profiling information; specifically, perf annotate
will give you per-instruction relative counts.
It is possible to drill down to the instruction level with perf annotate
. For that, you need to invoke perf annotate
with the name of the command to annotate. All the functions with samples will be disassembled and each instruction will have its relative percentage of samples reported:
perf record ./noploop 5
perf annotate -d ./noploop
------------------------------------------------
Percent | Source code & Disassembly of noploop.noggdb
------------------------------------------------
:
:
:
: Disassembly of section .text:
:
: 08048484 <main>:
0.00 : 8048484: 55 push %ebp
0.00 : 8048485: 89 e5 mov %esp,%ebp [...]
0.00 : 8048530: eb 0b jmp 804853d <main+0xb9>
15.08 : 8048532: 8b 44 24 2c mov 0x2c(%esp),%eax
0.00 : 8048536: 83 c0 01 add $0x1,%eax
14.52 : 8048539: 89 44 24 2c mov %eax,0x2c(%esp)
14.27 : 804853d: 8b 44 24 2c mov 0x2c(%esp),%eax
56.13 : 8048541: 3d ff e0 f5 05 cmp $0x5f5e0ff,%eax
0.00 : 8048546: 76 ea jbe 8048532 <main+0xae> [...]
The valgrind tool cachegrind can be used to get execution counts of each line in the compiled assembly (the Ir
value in the first column).