I'm trying to create a benchmark test with java. Currently I have the following simple method:
public static long runTest(int times){
long start = System.nanoTime();
String str = "str";
for(int i=0; i<times; i++){
str = "str"+i;
}
return System.nanoTime()-start;
}
I'm currently having this loop multiple times within another loop that is happening multiple times and getting the min/max/avg time it takes to run this method through. Then I am starting some activity on another thread and testing again. Basically I am just wanting to get consistent results... It seems pretty consistent if I have the runTest loop 10 million times:
Number of times ran: 5
The max time was: 1231419504 (102.85% of the average)
The min time was: 1177508466 (98.35% of the average)
The average time was: 1197291937
The difference between the max and min is: 4.58%
Activated thread activity.
Number of times ran: 5
The max time was: 3872724739 (100.82% of the average)
The min time was: 3804827995 (99.05% of the average)
The average time was: 3841216849
The difference between the max and min is: 1.78%
Running with thread activity took 320.83% as much time as running without.
But this seems a bit much, and takes some time... if I try a lower number (100000) in the runTest loop... it starts to become very inconsistent:
Number of times ran: 5
The max time was: 34726168 (143.01% of the average)
The min time was: 20889055 (86.02% of the average)
The average time was: 24283026
The difference between the max and min is: 66.24%
Activated thread activity.
Number of times ran: 5
The max time was: 143950627 (148.83% of the average)
The min time was: 64780554 (66.98% of the average)
The average time was: 96719589
The difference between the max and min is: 122.21%
Running with thread activity took 398.3% as much time as running without.
Is there a way that I can do a benchmark like this that is both consistent and efficient/fast?
I'm not testing the code that is between the start and end times by the way. I'm testing the CPU load in a way (see how I'm starting some thread activity and retesting). So I think that what I'm looking for it something to substitute for the code I have in "runTest" that will yield quicker and more consistent results.
Thanks
In short:
(Micro-)benchmarking is very complex, so use a tool like the Benchmarking framework http://www.ellipticgroup.com/misc/projectLibrary.zip - and still be skeptical about the results ("Put micro-trust in a micro-benchmark", Dr. Cliff Click).
In detail:
There are a lot of factors that can strongly influence the results:
Brent Boyer's article "Robust Java benchmarking, Part 1: Issues" ( http://www.ibm.com/developerworks/java/library/j-benchmark1/index.html) is a good description of all those issues and whether/what you can do against them (e.g. use JVM options or call ProcessIdleTask beforehand).
You won't be able to eliminate all these factors, so doing statistics is a good idea. But:
The above mentioned Benchmark framework ( http://www.ellipticgroup.com/misc/projectLibrary.zip) uses these techniques. You can read about them in Brent Boyer's article "Robust Java benchmarking, Part 2: Statistics and solutions" ( https://www.ibm.com/developerworks/java/library/j-benchmark2/).
Your code ends up testing mainly garbage collection performance because appending to a String in a loop ends up creating and immediately discarding a large number of increasingly large String objects.
This is something that inherently leads to wildly varying measurements and is influenced strongy by multi-thread activity.
I suggest you do something else in your loop that has more predictable performance, like mathematical calculations.
In the 10 million times run, odds are good the HotSpot compiler detected a "heavily used" piece of code and compiled it into machine native code.
JVM bytecode is interpreted, which leads it susceptible to more interrupts from other background processes occurring in the JVM (like garbage collection).
Generally speaking, these kinds of benchmarks are rife with assumptions that don't hold. You cannot believe that a micro benchmark really proves what it set out to prove without a lot of evidence proving that the initial measurement (time) isn't actually measuring your task and possibly some other background tasks. If you don't attempt to control for background tasks, then the measurement is much less useful.