I know that I can use gprof to benchmark my code.
However, I have this problem -- I have a smart pointer that has an extra level of indirection (think of it as a proxy object).
As a result, I have this extra layer that effects pretty much all functions, and screws with caching.
Is there a way to measure the time my CPU wastes due to cache misses?
Thanks!
Linux supports with
perf
from 2.6.31 on. This allows you to do the following:perf record -e LLC-loads,LLC-load-misses yourExecutable
perf report
LLC-load-misses
line,annotate
. You should see the lines (in assembly code, surrounded by the the original source code) and a number indicating what fraction of last level cache misses for the lines where cache misses occurred.If you're running an AMD processor, you can get CodeAnalyst, apparently free as in beer.
You could find a tool that accesses the CPU performance counters. There is probably a register in each core that counts L1, L2, etc misses. Alternately Cachegrind performs a cycle-by-cycle simulation.
However, I don't think that would be insightful. Your proxy objects are presumably modified by their own methods. A conventional profiler will tell you how much time those methods are taking. No profile tool would tell you how performance would improve without that source of cache pollution. That's a matter of reducing the size and structure of the program's working set, which isn't easy to extrapolate.
A quick Google search turned up
boost::intrusive_ptr
which might interest you. It doesn't appear to support something likeweak_ptr
, but converting your program might be trivial, and then you would know for sure the cost of the non-intrusive ref counts.