For some reason, I can't sample (perf record
) hardware cache events:
# perf record -e L1-dcache-stores -a -c 100 -- sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.607 MB perf.data (~26517 samples) ]
# perf script
but I can count them (perf stat
):
# perf stat -e L1-dcache-stores -a -- sleep 5
Performance counter stats for 'sleep 5':
711,781 L1-dcache-stores
5.000842990 seconds time elapsed
I tried on different CPUs, OS versions (and kernel versions), perf
versions but the result is the same. Is this an expected behaviour? What is the reason? Can't perf
warn about this?
There is a difference in perf evlist -vvv
output of three perf.data, one of cache event, second of software event, and last of hw cycles event:
echo '2^234567 %2' | perf record -e L1-dcache-stores -c 100 -o cache bc
echo '2^234567 %2' | perf record -e cycles -c 100 -o cycles bc
echo '2^234567 %2' | perf record -e cs -c 100 -o cs bc
perf evlist -vvv -i cache
L1-dcache-stores: sample_freq=100, type: 3, config: 256, size: 96, sample_type: IP|TID|TIME, disabled: 1, inherit: 1, mmap: 1, mmap2: 1, comm: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1
perf evlist -vvv -i cycles
cycles: sample_freq=100, size: 96, sample_type: IP|TID|TIME, disabled: 1, inherit: 1, mmap: 1, mmap2: 1, comm: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1
perf evlist -vvv -i cs
cs: sample_freq=100, type: 1, config: 3, size: 96, sample_type: IP|TID|TIME, disabled: 1, inherit: 1, mmap: 1, mmap2: 1, comm: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1
There are different types, and types are defined as
0028 enum perf_type_id {
0029 PERF_TYPE_HARDWARE = 0,
0030 PERF_TYPE_SOFTWARE = 1,
0031 PERF_TYPE_TRACEPOINT = 2,
0032 PERF_TYPE_HW_CACHE = 3,
0033 PERF_TYPE_RAW = 4,
0034 PERF_TYPE_BREAKPOINT = 5,
0035
0036 PERF_TYPE_MAX, /* non-ABI */
0037 };
Perf script has a output
table which defines how to print event of every kind: http://lxr.free-electrons.com/source/tools/perf/builtin-script.c?v=3.16#L68
68 /* default set to maintain compatibility with current format */
69 static struct {
70 bool user_set;
71 bool wildcard_set;
72 unsigned int print_ip_opts;
73 u64 fields;
74 u64 invalid_fields;
75 } output[PERF_TYPE_MAX] = {
76
77 [PERF_TYPE_HARDWARE] = {
78 .user_set = false,
79
80 .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
81 PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
82 PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
83 PERF_OUTPUT_SYM | PERF_OUTPUT_DSO,
84
85 .invalid_fields = PERF_OUTPUT_TRACE,
86 },
87
88 [PERF_TYPE_SOFTWARE] = {
89 .user_set = false,
90
91 .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
92 PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
93 PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
94 PERF_OUTPUT_SYM | PERF_OUTPUT_DSO,
95
96 .invalid_fields = PERF_OUTPUT_TRACE,
97 },
98
99 [PERF_TYPE_TRACEPOINT] = {
100 .user_set = false,
101
102 .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
103 PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
104 PERF_OUTPUT_EVNAME | PERF_OUTPUT_TRACE,
105 },
106
107 [PERF_TYPE_RAW] = {
108 .user_set = false,
109
110 .fields = PERF_OUTPUT_COMM | PERF_OUTPUT_TID |
111 PERF_OUTPUT_CPU | PERF_OUTPUT_TIME |
112 PERF_OUTPUT_EVNAME | PERF_OUTPUT_IP |
113 PERF_OUTPUT_SYM | PERF_OUTPUT_DSO,
114
115 .invalid_fields = PERF_OUTPUT_TRACE,
116 },
117 };
118
So, there is no instructions of printing any of field from samples with type 3 - PERF_TYPE_HW_CACHE, and perf script
does not print them. We can try to register this type in output
array and even push the patch to kernel.