So I wanted to benchmark some basic java functionality to add some imformation to this question: What is the gain from declaring a method as static.
I know writing benchmarks is sometimes not easy but what happens here I cannot explain.
Please note that I'm not interessted in how to fix this but on why this happens*
The Test class:
public class TestPerformanceOfStaticVsDynamicCalls {
private static final long RUNS = 1_000_000_000L;
public static void main( String [] args ){
new TestPerformanceOfStaticVsDynamicCalls().run();
}
private void run(){
long r=0;
long start, end;
for( int loop = 0; loop<10; loop++ ){
// Benchmark
start = System.currentTimeMillis();
for( long i = 0; i < RUNS; i++ ) {
r += addStatic( 1, i );
}
end = System.currentTimeMillis();
System.out.println( "Static: " + ( end - start ) + " ms" );
start = System.currentTimeMillis();
for( long i = 0; i < RUNS; i++ ) {
r += addDynamic( 1, i );
}
end = System.currentTimeMillis();
System.out.println( "Dynamic: " + ( end - start ) + " ms" );
// Do something with r to keep compiler happy
System.out.println( r );
}
}
private long addDynamic( long a, long b ){
return a+b;
}
private static long addStatic( long a, long b ){
return a+b;
}
}
I was expecting the first loop to be warmup and the following loops to be faster.
Running this in Eclipse gave those strange results:
Static: 621 ms
Dynamic: 631 ms
1000000001000000000
Static: 2257 ms
Dynamic: 2501 ms
2000000002000000000
Static: 2258 ms
Dynamic: 2469 ms
3000000003000000000
Static: 2231 ms
Dynamic: 2464 ms
4000000004000000000
So wtf? It got slower. To crosscheck I run the same code with java/c 7:
Static: 620 ms
Dynamic: 627 ms
1000000001000000000
Static: 897 ms
Dynamic: 617 ms
2000000002000000000
Static: 901 ms
Dynamic: 615 ms
3000000003000000000
Static: 888 ms
Dynamic: 616 ms
4000000004000000000
So here only the static calls got slower for the following loops. Even more strange if I rearange the code to only print r
after the final loop I get this in Eclipse:
Static: 620 ms
Dynamic: 635 ms
Static: 2285 ms
Dynamic: 893 ms
Static: 2258 ms
Dynamic: 900 ms
Static: 2280 ms
Dynamic: 905 ms
4000000004000000000
And this with java/c 7:
Static: 620 ms
Dynamic: 623 ms
Static: 890 ms
Dynamic: 614 ms
Static: 890 ms
Dynamic: 616 ms
Static: 886 ms
Dynamic: 614 ms
4000000004000000000
while at it changing the order of the dynamic/static benchmarks in eclipse:
Dynamic: 618 ms
Static: 626 ms
1000000001000000000
Dynamic: 632 ms
Static: 2524 ms
2000000002000000000
Dynamic: 617 ms
Static: 2528 ms
3000000003000000000
Dynamic: 622 ms
Static: 2506 ms
4000000004000000000
and in java/c 7:
Dynamic: 625 ms
Static: 646 ms
1000000001000000000
Dynamic: 2470 ms
Static: 633 ms
2000000002000000000
Dynamic: 2459 ms
Static: 635 ms
3000000003000000000
Dynamic: 2464 ms
Static: 645 ms
4000000004000000000
So what is happening here?
EDIT: Some System informations:
Java version "1.7.0_55"
OpenJDK Runtime Environment (IcedTea 2.4.7) (7u55-2.4.7-1ubuntu1)
OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)
Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz
EDIT2:
Using Java8:
Static: 620 ms
Dynamic: 624 ms
1000000001000000000
Static: 890 ms
Dynamic: 618 ms
2000000002000000000
Static: 891 ms
Dynamic: 616 ms
3000000003000000000
Static: 892 ms
Dynamic: 617 ms
4000000004000000000
Other code orderings produce similar strange (but ohter) results here.
Preamble: Writing microbenchmarks manually is almost always doomed to a failure.
There are frameworks that have already solved the common benchmarking problems.
JIT compilation unit is a method. Incorporating several benchmarks into a single method leads to unpredictable results.
JIT heavily relies on the execution profile, i.e. the run-time statistics. If a method runs the first scenario for a long time, JIT will optimize the generated code for it. When the method suddenly switches to another scenario, do not expect it to run at the same speed.
JIT may skip optimizing the code that is not executed. It will leave an uncommon trap for this code. If the trap is ever hit, JVM will deoptimize the compiled method, switch to interpreter and after that recompile the code with the new knowledge. E.g. when your method run
is compiled for the first time inside the first hot loop, JIT does not know about System.out.println
yet. As soon as the execution reaches println
, the earlier compiled code is likely to get deoptimized.
The bigger is method - the harder is to optimize it for JIT compiler. E.g. it may appear that there is not enough spare registers to hold all local variables. That's what happen in your case.
To sum it up, your benchmark seem to pass through the following scenario:
- The first hot loop (
addStatic
) triggers the compilation of run
method. The execution profile does not know anything except addStatic
method.
System.out.println
triggers the deoptimization and after that the second hot loop (addDynamic
) leads to recompilation of run
method.
- Now the execution profile contains information only about
addDynamic
, so JIT optimizes the second loop, and the first loop appears to have extra register spills:
Optimized loop:
0x0000000002d01054: add %rbx,%r14
0x0000000002d01057: add $0x1,%rbx ;*ladd
; - TestPerformanceOfStaticVsDynamicCalls::addDynamic@2
; - TestPerformanceOfStaticVsDynamicCalls::run@105
0x0000000002d0105b: add $0x1,%r14 ; OopMap{rbp=Oop off=127}
;*goto
; - TestPerformanceOfStaticVsDynamicCalls::run@116
0x0000000002d0105f: test %eax,-0x1c91065(%rip) # 0x0000000001070000
;*lload
; - TestPerformanceOfStaticVsDynamicCalls::run@92
; {poll}
0x0000000002d01065: cmp $0x3b9aca00,%rbx
0x0000000002d0106c: jl 0x0000000002d01054
Loop with an extra register spill:
0x0000000002d011d0: mov 0x28(%rsp),%r11 <---- the problem is here
0x0000000002d011d5: add %r10,%r11
0x0000000002d011d8: add $0x1,%r10
0x0000000002d011dc: add $0x1,%r11
0x0000000002d011e0: mov %r11,0x28(%rsp) ;*ladd
; - TestPerformanceOfStaticVsDynamicCalls::addStatic@2
; - TestPerformanceOfStaticVsDynamicCalls::run@33
0x0000000002d011e5: mov 0x28(%rsp),%r11 <---- the problem is here
0x0000000002d011ea: add $0x1,%r11 ; OopMap{[32]=Oop off=526}
;*goto
; - TestPerformanceOfStaticVsDynamicCalls::run@44
0x0000000002d011ee: test %eax,-0x1c911f4(%rip) # 0x0000000001070000
;*goto
; - TestPerformanceOfStaticVsDynamicCalls::run@44
; {poll}
0x0000000002d011f4: cmp $0x3b9aca00,%r10
0x0000000002d011fb: jl 0x0000000002d011d0 ;*ifge
; - TestPerformanceOfStaticVsDynamicCalls::run@25
P.S. The following JVM options are useful to analyze the JIT compilation:
-XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining -XX:+PrintAssembly -XX:CompileOnly=TestPerformanceOfStaticVsDynamicCalls
It looks like is the way Java is adding values to variable r
.
I've made a few changes, adding method run2()
:
public class TestPerformanceOfStaticVsDynamicCalls {
private static final long RUNS = 1_000_000_000L;
public static void main(String[] args) {
System.out.println("Test run 1 =================================");
new TestPerformanceOfStaticVsDynamicCalls().run();
System.out.println("Test run 2 =================================");
new TestPerformanceOfStaticVsDynamicCalls().run2();
}
private void run2() {
long r = 0;
long start, end;
for (int loop = 0; loop < 10; loop++) {
// Benchmark
long stat = 0;
start = System.currentTimeMillis();
for (long i = 0; i < RUNS; i++) {
stat += addStatic(1, i);
}
end = System.currentTimeMillis();
System.out.println("Static: " + (end - start) + " ms");
long dyna = 0;
start = System.currentTimeMillis();
for (long i = 0; i < RUNS; i++) {
dyna += addDynamic(1, i);
}
end = System.currentTimeMillis();
System.out.println("Dynamic: " + (end - start) + " ms");
// If you really want to have values in "r" then...
r += stat + dyna;
// Do something with r to keep compiler happy
System.out.println(r);
}
}
private void run() {
long r = 0;
long start, end;
for (int loop = 0; loop < 10; loop++) {
// Benchmark
start = System.currentTimeMillis();
for (long i = 0; i < RUNS; i++) {
r += addStatic(1, i);
}
end = System.currentTimeMillis();
System.out.println("Static: " + (end - start) + " ms");
start = System.currentTimeMillis();
for (long i = 0; i < RUNS; i++) {
r += addDynamic(1, i);
}
end = System.currentTimeMillis();
System.out.println("Dynamic: " + (end - start) + " ms");
// If you really want to have values in "r" then...
// Do something with r to keep compiler happy
System.out.println(r);
}
}
private long addDynamic(long a, long b) {
return a + b;
}
private static long addStatic(long a, long b) {
return a + b;
}
}
The results for are:
Test run 1 =================================
Static: 582 ms
Dynamic: 579 ms
1000000001000000000
Static: 2065 ms
Dynamic: 2352 ms
2000000002000000000
Static: 2084 ms
Dynamic: 2345 ms
3000000003000000000
Static: 2095 ms
Dynamic: 2347 ms
4000000004000000000
Static: 2102 ms
Dynamic: 2338 ms
5000000005000000000
Static: 2073 ms
Dynamic: 2345 ms
6000000006000000000
Static: 2074 ms
Dynamic: 2341 ms
7000000007000000000
Static: 2102 ms
Dynamic: 2355 ms
8000000008000000000
Static: 2062 ms
Dynamic: 2354 ms
9000000009000000000
Static: 2057 ms
Dynamic: 2350 ms
-8446744063709551616
Test run 2 =================================
Static: 584 ms
Dynamic: 582 ms
1000000001000000000
Static: 587 ms
Dynamic: 577 ms
2000000002000000000
Static: 577 ms
Dynamic: 579 ms
3000000003000000000
Static: 577 ms
Dynamic: 577 ms
4000000004000000000
Static: 578 ms
Dynamic: 579 ms
5000000005000000000
Static: 578 ms
Dynamic: 580 ms
6000000006000000000
Static: 577 ms
Dynamic: 579 ms
7000000007000000000
Static: 578 ms
Dynamic: 577 ms
8000000008000000000
Static: 580 ms
Dynamic: 578 ms
9000000009000000000
Static: 576 ms
Dynamic: 579 ms
-8446744063709551616
As for why adding directly to r
, I have no clue. Maybe somebody can provide more insights on why accessing r
inside the loop block
makes things much slower.
Just one additional Note. I can only observe this strange behavior if I use long
for r
and i
s. If I convert them to int
then I get these timings:
Static: 352 ms
Dynamic: 353 ms
Static: 348 ms
Dynamic: 349 ms
Static: 349 ms
Dynamic: 348 ms
Static: 349 ms
Dynamic: 344 ms
So one possible conclusion is to avoid long
in those situations. At least with Linux/Amd64 Java7 where perfomance matters.