Is there any way to catch the L3-cache hits and misses by perf tool in Linux. According to the output of perf list cache
, L1 and LLC cache are supported. According to the definition of perf_evsel__hw_cache array in perf's source code:
const char *perf_evsel__hw_cache[PERF_COUNT_HW_CACHE_MAX]
[PERF_EVSEL__MAX_ALIASES] = {
{ "L1-dcache", "l1-d", "l1d", "L1-data", },
{ "L1-icache", "l1-i", "l1i", "L1-instruction", },
{ "LLC", "L2", },
{ "dTLB", "d-tlb", "Data-TLB", },
{ "iTLB", "i-tlb", "Instruction-TLB", },
{ "branch", "branches", "bpu", "btb", "bpc", },
{ "node", },
};
LLC is an alias to L2-cache. My question is how to catch the L3-cache hits and misses by perf tool in Linux. Thanks in advance!
That is strange LLC (Last Level Cache) is configured with "L2" if the hardware has L3 cache. But I don't know yet internals of perf and maybe these settings are generic.
I think the only solution you have is to use "raw hardware event" (see at the end of "perf list", the line starting with "rNNN"). That gives the opportunity to encode a description of the hardware registers.
The perf user guide and tutorial only mention "To measure an actual PMU as provided by the HW vendor documentation, pass the hexadecimal parameter code". I don't know what is the syntax on Intel and if there is different implementations of the performance monitor on this architecture. You could start here:
http://code.google.com/p/kernel/wiki/PerfUserGuide#Hardware_events
I have had more success using raw event counters, looking directly at the Intel Software Developer Manual for detailed definitions.
http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.html
From section:
18.2.1.2 Pre-defined Architectural Performance Events
r412e "LLC Misses" is likely the one you want
perf stat -e r412e <command>
(Note that for me, this gives the same number as using -e cache-misses.)
To get system-wide L3 cache miss rate, just do:
$ sudo perf stat -a -e LLC-loads,LLC-load-misses,LLC-stores,LLC-store-misses,LLC-prefetch-misses sleep 5
Performance counter stats for 'system wide':
24,477,266,369 LLC-loads (22.65%)
1,409,470,007 LLC-load-misses # 5.76% of all LL-cache hits (29.79%)
88,584,705 LLC-stores (30.32%)
10,545,277 LLC-store-misses (30.03%)
150,785,745 LLC-prefetch-misses (34.71%)
13.773144159 seconds time elapsed
This prints out both misses and total references. The ratio is the L3 cache miss rate.
See complete event list on wiki: https://perf.wiki.kernel.org/index.php/Tutorial#Events