On Intel x86, Linux uses the event l1d.replacements
to implement its L1-dcache-load-misses
event.
This event is defined as follows:
Counts L1D data line replacements including opportunistic replacements, and replacements that require stall-for-replace or block-for-replace.
Perhaps naively, I would have expected perf
to use something like mem_load_retired.l1_miss
, which supports PEBS and is defined as:
Counts retired load instructions with at least one uop that missed in the L1 cache. (Supports PEBS)
The event values are usually not exactly very close, and sometimes they vary wildly. For example:
$ocperf stat -e mem_inst_retired.all_loads,l1d.replacement,mem_load_retired.l1_hit,mem_load_retired.l1_miss,mem_load_retired_fb_hit head -c100M /dev/urandom > /dev/null
Performance counter stats for 'head -c100M /dev/urandom':
445,662,315 mem_inst_retired_all_loads
92,968 l1d_replacement
443,864,439 mem_load_retired_l1_hit
1,694,671 mem_load_retired_l1_miss
28,080 mem_load_retired_fb_hit
There are more than 17 times more "L1 misses" as measured by mem_load_retired.l1_miss
as compared to l1d.replacement
. Conversely, you can also find examples where l1d.replacement
is much higher than the mem_load_retired
counters.
What exactly is l1d.replacement
measuring, why was it chosen in the kernel, and is it a better proxy for L1 d-cache misses than mem_load_retired.l1_miss
?