My ultimate goal is to create a comprehensive set of benchmarks for several Java primitive collection libraries using the standard Java collections as a baseline. In the past I have used the looping method of writing these kinds of micro-benchmarks. I put the function I am benchmarking in a loop and iterate 1 million+ times so the jit has a chance to warmup. I take the total time of the loop and then divide by the number of iterations to get an estimate for the amount of time a single call to the function I am benchmarking would take. After recently reading about the JMH project and specifically this example: JMHSample_11_Loops I see the issue with this approach.
My Machine:
Windows 7 64-bit
Core i7-2760QM @ 2.40 GHz
8.00 GB Ram
jdk1.7.0_45 64-bit
Here is a stripped down simple example of the looping method code described above:
public static void main(String[] args) {
HashMap<Long, Long> hmap = new HashMap<Long, Long>();
long val = 0;
//populating the hashmap
for (long idx = 0; idx < 10000000; idx++) {
hmap.put(idx, idx);
}
Stopwatch s = Stopwatch.createStarted();
long x = 0;
for (long idx = 0; idx < 10000000; idx++) {
x = hmap.get(idx);
}
s.stop();
System.out.println(s); //5.522 s
System.out.println(x); //9999999
//5.522 seconds / 10000000 = 552.2 nanoseconds
}
Here is my attempt at rewriting this benchmark using JMH:
package com.test.benchmarks;
import org.openjdk.jmh.annotations.*;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
import java.util.HashMap;
import java.util.concurrent.TimeUnit;
@State(Scope.Thread)
public class MyBenchmark {
private HashMap<Long, Long> hmap = new HashMap<Long, Long>();
private long key;
@Setup(Level.Iteration)
public void setup(){
key = 0;
for(long i = 0; i < 10000000; i++) {
hmap.put(i, i);
}
}
@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public long testGetExistingKey() throws InterruptedException{
if(key >= 10000000) key=0;
return hmap.get(key++);
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(".*" + MyBenchmark.class.getSimpleName() + ".*")
.warmupIterations(5)
.measurementIterations(25)
.forks(1)
.build();
new Runner(opt).run();
}
}
Here are the results:
Result: 31.163 ±(99.9%) 11.732 ns/op [Average]
Statistics: (min, avg, max) = (0.000, 31.163, 939008.000), stdev = 1831.428
Confidence interval (99.9%): [19.431, 42.895]
Samples, N = 263849
mean = 31.163 ±(99.9%) 11.732 ns/op
min = 0.000 ns/op
p( 0.0000) = 0.000 ns/op
p(50.0000) = 0.000 ns/op
p(90.0000) = 0.000 ns/op
p(95.0000) = 427.000 ns/op
p(99.0000) = 428.000 ns/op
p(99.9000) = 428.000 ns/op
p(99.9900) = 856.000 ns/op
p(99.9990) = 9198.716 ns/op
p(99.9999) = 939008.000 ns/op
max = 939008.000 ns/op
# Run complete. Total time: 00:02:07
Benchmark Mode Samples Score Score error Units
c.t.b.MyBenchmark.testGetExistingKey sample 263849 31.163 11.732 ns/op
As far as I can tell, the same benchmark in JMH has hashmap gets at 31 nanoseconds vs 552 nanoseconds for the looping test. 31 nanoseconds seems a little too fast for me. Looking at Latency Numbers Every Programmer Should Know a main memory reference is around 100 nanoseconds. L2 cache reference is roughly 7 nanoseconds, but the HashMap with 10 million Long keys and values well exceeds L2. Also the JMH results look strange to me. 90% of the get calls take 0.0 nanoseconds?
I am assuming this is user error. Any help/pointers would be appreciated. Thanks.
UPDATE
Here are the results from doing AverageTime
run. This is much more inline with my expectations. Thanks @oleg-estekhin! In the comments below I mentioned that I had done the AverageTime
test previously and had similar results as SampleTime
. I believe doing that run I had used a HashMap with far fewer entries and that the faster lookups did make sense.
Result: 266.306 ±(99.9%) 139.359 ns/op [Average]
Statistics: (min, avg, max) = (27.266, 266.306, 1917.271), stdev = 410.904
Confidence interval (99.9%): [126.947, 405.665]
# Run complete. Total time: 00:07:17
Benchmark Mode Samples Score Score error Units
c.t.b.MyBenchmark.testGetExistingKey avgt 100 266.306 139.359 ns/op
First, looping test measures average time, while your JMH code is configured for sample time. From the
Mode.SampleTime
javadoc:Individual executions of
Map.get()
are pretty fast to the point when the underlying time measurement system will report 0 for some of the executions due to time measurement granularity (read Nanotrusting the Nanotime blog post by the JMH author for more information).In the sample mode the benchmarks collects individual sample times into an array and then calculates averages and percentiles using that array. When more then half of the array values are zero (in your particular setup more than 90% of array values are zero, as indicated by the
p(90.0000) = 0.000 ns/op
) the average is bound to be pretty low but when you seep(50) = 0
(and especiallyp(90) = 0
) in your output the only conclusion you can reliable make is that these results are garbage and you need to find another way to measure that code.You should use
Mode.AverageTime
(orMode.Throughput
) benchmark mode. LeaveMode.SampleTime
for situations when individual invocation takes substantial time.You could add a "baseline" benchmark which executes the
if ()
andkey++
in order to isolate the time required forkey
bookkeeping and the actualMap.get()
time, but you will need to explain the results (the blog post linked above describes pitfalls with subtracting "baselines" from "real" measurements).You could try to use
Blackhole.consumeCPU()
to increase the execution time of individual invocation (see the previous point about "baselines" and associated pitfalls).