After watching the presentation "Performance Anxiety" of Joshua Bloch, I read the paper he suggested in the presentation "Evaluating the Accuracy of Java Profilers". Quoting the conclusion:
Our results are disturbing because they indicate that profiler incorrectness is pervasive—occurring in most of our seven benchmarks and in two production JVM—-and significant—all four of the state-of-the-art profilers produce incorrect profiles. Incorrect profiles can easily cause a performance analyst to spend time optimizing cold methods that will have minimal effect on performance. We show that a proof-of-concept profiler that does not use yield points for sampling does not suffer from the above problems
The conclusion of the paper is that we cannot really believe the result of profilers. But then, what is the alternative of using profilers. Should we go back and just use our feeling to do optimization?
UPDATE: A point that seems to be missed in the discussion is observer effect. Can we build a profiler that really 'observer effect'-free?
Actually, you are better off profiling at the database level. Most enterprise databases come with the ability to show the top queries over a period of time. Start working on those queries until the top ones are down to 300 ms or less, and you will have made great progress. Profilers are useful for showing behavior of the heap and for identifying blocked threads, but I personally have never gotten much traction with the development teams on identifying hot methods or large objects.
If I read it correctly, the paper only talks about sample-based profiling. Many profilers also do instrumentation-based profiling. It's much slower and has some other problems, but it should not suffer from the biases the paper talks about.
No. The conclusion of the paper is that current profilers' measuring methods have specific defects. They propose a fix. The paper is quite recent. I'd expect profilers to implement this fix eventually. Until then, even a defective profiler is still much better than "feeling".
If you don't trust profilers, then you can go into paranoia mode by using aspect oriented programming, wrapping around every method in your application and then using a logger to log every method invocation.
Your application will really slow down, but at least you'll have a precise count of how many times each method is invoked. If you also want to see how long each method takes to execute, wrap around every method perf4j.
After dumping all these statistics to text files, use some tools to extract all necessary information and then visualize it. I'd guess this will give you a pretty good overview of how slow your application is in certain places.
Oh, man, where to begin?
First, I'm amazed that this is news. Second, the problem is not that profilers are bad, it is that some profilers are bad. The authors built one that, they feel, is good, just by avoiding some of the mistakes they found in the ones they evaluated. Mistakes are common because of some persistent myths about performance profiling.
But let's be positive. If one wants to find opportunities for speedup, it is really very simple:
Sampling should be uncorrelated with the state of the program.
That means happening at a truly random time, regardless of whether the program is in I/O (except for user input), or in GC, or in a tight CPU loop, or whatever.
Sampling should read the function call stack,
so as to determine which statements were "active" at the time of the sample. The reason is that every call site (point at which a function is called) has a percentage cost equal to the fraction of time it is on the stack. (Note: the paper is concerned entirely with self-time, ignoring the massive impact of avoidable function calls in large software. In fact, the reason behind the original
gprof
was to help find those calls.)Reporting should show percent by line (not by function).
If a "hot" function is identified, one still has to hunt inside it for the "hot" lines of code accounting for the time. That information is in the samples! Why hide it?
An almost universal mistake (that the paper shares) is to be concerned too much with accuracy of measurement, and not enough with accuracy of location. For example, here is an example of performance tuning in which a series of performance problems were identified and fixed, resulting in a compounded speedup of 43 times. It was not essential to know precisely the size of each problem before fixing it, but to know its location. A phenomenon of performance tuning is that fixing one problem, by reducing the time, magnifies the percentages of remaining problems, so they are easier to find. As long as any problem is found and fixed, progress is made toward the goal of finding and fixing all the problems. It is not essential to fix them in decreasing size order, but it is essential to pinpoint them.
On the subject of statistical accuracy of measurement, if a call point is on the stack some percent of time F (like 20%), and N (like 100) random-time samples are taken, then the number of samples that show the call point is a binomial distribution, with mean = NF = 20, standard deviation = sqrt(NF(1-F)) = sqrt(16) = 4. So the percent of samples that show it will be 20% +/- 4%. So is that accurate? Not really, but has the problem been found? Precisely.
In fact, the larger a problem is, in terms of percent, the fewer samples are needed to locate it. For example, if 3 samples are taken, and a call point shows up on 2 of them, it is highly likely to be very costly. (Specifically, it follows a beta distribution. If you generate 4 uniform 0,1 random numbers, and sort them, the distribution of the 3rd one is the distribution of cost for that call point. It's mean is (2+1)/(3+2) = 0.6, so that is the expected savings, given those samples.) INSERTED: And the speedup factor you get is governed by another distribution, BetaPrime, and its average is 4. So if you take 3 samples, see a problem on 2 of them, and eliminate that problem, on average you will make the program four times faster.
It's high time we programmers blew the cobwebs out of our heads on the subject of profiling.
Disclaimer - the paper failed to reference my article: Dunlavey, “Performance tuning with instruction-level cost derived from call-stack sampling”, ACM SIGPLAN Notices 42, 8 (August, 2007), pp. 4-8.
Unless you are building bleeding edge applications that need every CPU cycle then I have found that profilers are a good way to find the 10% slowest parts of your code. As a developer, I would argue that should be all you really care about in nearly all cases.
I have experience with http://www.dynatrace.com/en/ and I can tell you it is very good at finding the low hanging fruit.
Profilers are like any other tool and they have their quirks but I would trust them over a human any day to find the hot spots in your app to look at.