I'm using perf_event_open to get samples. I try to get everyone hit of point. But perf_event_open is not fast enough. I try to change the sample rate using below command:
echo 10000000 > /proc/sys/kernel/perf_event_max_sample_rate
But it looks like the value I set was too large. After running my code, perf_event_max_sample_rate is change back to a lower value such as 12500. And when I try to change bigger value,for example 20000000,50000000 and so on, the sample speed is not increased as value I changed to. Is there any way to change perf_event_open sample speed more faster?
This is a mechanism to limit the overhead caused by perf. You can disable it by setting
sysctl -w kernel.perf_cpu_time_max_percent=0
Use at your own risk - the system may stop to respond.
https://www.kernel.org/doc/Documentation/sysctl/kernel.txt
perf_cpu_time_max_percent:
Hints to the kernel how much CPU time it should be allowed to use to
handle perf sampling events. If the perf subsystem is informed that
its samples are exceeding this limit, it will drop its sampling
frequency to attempt to reduce its CPU usage.
Some perf sampling happens in NMIs. If these samples unexpectedly
take too long to execute, the NMIs can become stacked up next to each
other so much that nothing else is allowed to execute.
0: disable the mechanism. Do not monitor or correct perf's
sampling rate no matter how CPU time it takes.
1-100: attempt to throttle perf's sample rate to this percentage of
CPU. Note: the kernel calculates an "expected" length of each
sample event. 100 here means 100% of that expected length. Even
if this is set to 100, you may still see sample throttling if this
length is exceeded. Set to 0 if you truly do not care how much CPU
is consumed.
It is really not possible to increase the perf_event_max_sample_rate
beyond a certain value.
I have tried increasing it to above 100,000
, say for example something like a 200,000
or something more. Every time I did this, the max sample rate always came down to something like 146,500 samples/sec or less. If I recall correctly, this was the maximum I could achieve (i.e. 146,500 samples/sec). This would of course, depend on the kind of machine you are using and the CPU frequencies etc. I was working on an Intel Xeon v-5 Broadwell CPU.
Zulan makes a good point. To make your understanding clearer, the perf sample collection is based on interrupts. Every time the sampling counter overflows, perf
would raise an NM(non-maskable) interrupt. This interrupt meanwhile will calculate the time it takes to actually handle the whole interrupt process. You can see this in the below kernel code :-
perf_event_nmi_handler
Now once it has calculated the time for handling the interrupt, it calls another function (in which it passes the interrupt handling time as a parameter) where it tries to examine and compare the current perf_event_max_sample_rate
with the time it takes to handle the interrupt. If it finds that the interrupt is taking a long enough time and at the same time, the samples are being generated very frequently, the CPU will obviously not be able to keep up as interrupt work starts getting queued up and you will observe some amount of CPU throttling
. If you look at the below function, there will always be an attempt to reduce the sample
Read the below function to understand :-
perf_event_sample_took
Of course, as Zulan suggested, you can try making it 0, but you would get the same maximum number of samples from perf
and further hurt the CPU, it is not possible to increase the maximum unless you figure out other means (like tweaking the buffer if possible).