What does the OProfile warning 'dropping hyper

2019-06-25 11:14发布

问题:

When using the statistical execution profiler OProfile to visualise a callgraph profile for my C application, it includes the following warning multiple times. The warning is rather cryptic to me:

warning: dropping hyperspace sample at offset 1af9 >= 2be8 for binary /home/myuser/mybinary

I'm using OProfile in a Xen virtualized environment like this:

modprobe oprofile timer=1
opcontrol --no-vmlinux
opcontrol --start
(wait for profiling data to accumulate)
opcontrol --stop
opreport --session-dir=/var/lib/oprofile --exclude-dependent --demangle=smart \
--symbols /home/myuser/mybinary --callgraph

The complete output from the last command is:

Overflow stats not available
CPU: CPU with timer interrupt, speed 0 MHz (estimated)
Profiling through timer interrupt
warning: dropping hyperspace sample at offset 84d0 >= 79a0 for binary /home/myuser/mybinary
warning: dropping hyperspace sample at offset 7ac0 >= 79a0 for binary /home/myuser/mybinary
warning: dropping hyperspace sample at offset 7d90 >= 79a0 for binary /home/myuser/mybinary
warning: dropping hyperspace sample at offset 7ac0 >= 79a0 for binary /home/myuser/mybinary
warning: dropping hyperspace sample at offset 7d90 >= 79a0 for binary /home/myuser/mybinary
warning: dropping hyperspace sample at offset 8210 >= 79a0 for binary /home/myuser/mybinary
samples  %        symbol name
-------------------------------------------------------------------------------

After that it prints plausible-looking call graph data.

What does the 'hyperspace' warning mean? What causes it? Does it impact profiling results? How can I fix it?

回答1:

Maynard Johnson explains this warning in a message to a mailinglist:

There have been cases reported where samples recorded by the oprofile kernel driver appear to be attributed to the wrong binary, in particular if the sample rate is very high or when doing callgraph profiling (since callgraph profiling, like a high sample rate, also results in very high overhead of the oprofile kernel driver and overflows of its internal sample buffers). I suspect that's what you're running into. Unfortunately, this is a very insidious bug, and no one has been able to find the root cause yet. The kernel driver does report overflow counts of its internal buffers, and the oprofiled log prints those out. As a convenience, starting with oprofile 0.9.5, opreport will also print a warning when it finds non-zero overflow counts and suggests lowering the sampling interval.

I suggest looking in your /var/lib/oprofile/samples/oprofiled.log and find the overflow statistics for the above profile run (log entries are timestamped). If you're only seeing a few or a very small percentage (say, less than 3%), you can probably just ignore the anomalies. In general, to avoid/limit this kind of thing, you should profile at the lowest sampling rate practical, especially when you're doing callgraph profiling. So what do I mean by "practical"? Well, as with any sample-based profiler, oprofile is statistical in nature. And the more data points you have, the more confidence you have in the data. So for 100% confidence, you should (theoretically) profile with a count value of '1'. Not too practical, though, since your machine may appear to lock up because most of the work being done is recording samples. For cycles event profiling, you can probably use a count value of several million or so (on today's processors) and still be pretty confident of the da ta. For other events, it really depends on their frequency of occurrence.