I've seen a lot of threads here that compare and try to answer which is faster: newInstance
or new operator
.
Looking at the source code, it would seem that newInstance
should be much slower, I mean it does so many security checks and uses reflection. And I've decided to measure, first running jdk-8. Here is the code using jmh
.
@BenchmarkMode(value = { Mode.AverageTime, Mode.SingleShotTime })
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@State(Scope.Benchmark)
public class TestNewObject {
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder().include(TestNewObject.class.getSimpleName()).build();
new Runner(opt).run();
}
@Fork(1)
@Benchmark
public Something newOperator() {
return new Something();
}
@SuppressWarnings("deprecation")
@Fork(1)
@Benchmark
public Something newInstance() throws InstantiationException, IllegalAccessException {
return Something.class.newInstance();
}
static class Something {
}
}
I don't think there are big surprises here (JIT does a lot of optimizations that make this difference not that big):
Benchmark Mode Cnt Score Error Units
TestNewObject.newInstance avgt 5 7.762 ± 0.745 ns/op
TestNewObject.newOperator avgt 5 4.714 ± 1.480 ns/op
TestNewObject.newInstance ss 5 10666.200 ± 4261.855 ns/op
TestNewObject.newOperator ss 5 1522.800 ± 2558.524 ns/op
The difference for the hot code would be around 2x and much worse for single shot time.
Now I switch to jdk-9 (build 157 in case it matters) and run the same code.
And the results:
Benchmark Mode Cnt Score Error Units
TestNewObject.newInstance avgt 5 314.307 ± 55.054 ns/op
TestNewObject.newOperator avgt 5 4.602 ± 1.084 ns/op
TestNewObject.newInstance ss 5 10798.400 ± 5090.458 ns/op
TestNewObject.newOperator ss 5 3269.800 ± 4545.827 ns/op
That's a whooping 50x difference in hot code. I'm using latest jmh version (1.19.SNAPSHOT).
After adding one more method to the test:
@Fork(1)
@Benchmark
public Something newInstanceJDK9() throws Exception {
return Something.class.getDeclaredConstructor().newInstance();
}
Here are the overall results n jdk-9:
TestNewObject.newInstance avgt 5 308.342 ± 107.563 ns/op
TestNewObject.newInstanceJDK9 avgt 5 50.659 ± 7.964 ns/op
TestNewObject.newOperator avgt 5 4.554 ± 0.616 ns/op
Can someone shed some light on why there is such a big difference?
First of all, the problem has nothing to do with the module system (directly).
I noticed that even with JDK 9 the first warmup iteration of newInstance
was as fast as with JDK 8.
# Fork: 1 of 1
# Warmup Iteration 1: 10,578 ns/op <-- Fast!
# Warmup Iteration 2: 246,426 ns/op
# Warmup Iteration 3: 242,347 ns/op
This means something has broken in JIT compilation.
-XX:+PrintCompilation
confirmed that the benchmark was recompiled after the first iteration:
10,762 ns/op
# Warmup Iteration 2: 1541 689 ! 3 java.lang.Class::newInstance (160 bytes) made not entrant
1548 692 % 4 bench.generated.NewInstance_newInstance_jmhTest::newInstance_avgt_jmhStub @ 13 (56 bytes)
1552 693 4 bench.generated.NewInstance_newInstance_jmhTest::newInstance_avgt_jmhStub (56 bytes)
1555 662 3 bench.generated.NewInstance_newInstance_jmhTest::newInstance_avgt_jmhStub (56 bytes) made not entrant
248,023 ns/op
Then -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining
pointed to the inlining problem:
1577 667 % 4 bench.generated.NewInstance_newInstance_jmhTest::newInstance_avgt_jmhStub @ 13 (56 bytes)
@ 17 bench.NewInstance::newInstance (6 bytes) inline (hot)
! @ 2 java.lang.Class::newInstance (160 bytes) already compiled into a big method
"already compiled into a big method" message means that the compiler has failed to inline Class.newInstance
call because the compiled size of the callee is larger than InlineSmallCode
value (which is 2000 by default).
When I reran the benchmark with -XX:InlineSmallCode=2500
, it became fast again.
Benchmark Mode Cnt Score Error Units
NewInstance.newInstance avgt 5 8,847 ± 0,080 ns/op
NewInstance.operatorNew avgt 5 5,042 ± 0,177 ns/op
You know, JDK 9 now has G1 as the default GC. If I fall back to Parallel GC, the benchmark will also be fast even with the default InlineSmallCode
.
Rerun JDK 9 benchmark with -XX:+UseParallelGC
:
Benchmark Mode Cnt Score Error Units
NewInstance.newInstance avgt 5 8,728 ± 0,143 ns/op
NewInstance.operatorNew avgt 5 4,822 ± 0,096 ns/op
G1 requires to put some barriers whenever an object store happens, that's why the compiled code becomes a bit larger, so that Class.newInstance
exceeds the default InlineSmallCode
limit. Another reason why compiled Class.newInstance
has become larger is that the reflection code had been slightly rewritten in JDK 9.
TL;DR JIT has failed to inline Class.newInstance
, because InlineSmallCode
limit has been exceeded. The compiled version of Class.newInstance
has become larger due to changes in reflection code in JDK 9 and because the default GC has been changed to G1.
The implementation of Class.newInstance()
is mostly identical, except the following part:
Java 8:
Constructor<T> tmpConstructor = cachedConstructor;
// Security check (same as in java.lang.reflect.Constructor)
int modifiers = tmpConstructor.getModifiers();
if (!Reflection.quickCheckMemberAccess(this, modifiers)) {
Class<?> caller = Reflection.getCallerClass();
if (newInstanceCallerCache != caller) {
Reflection.ensureMemberAccess(caller, this, null, modifiers);
newInstanceCallerCache = caller;
}
}
Java 9
Constructor<T> tmpConstructor = cachedConstructor;
// Security check (same as in java.lang.reflect.Constructor)
Class<?> caller = Reflection.getCallerClass();
if (newInstanceCallerCache != caller) {
int modifiers = tmpConstructor.getModifiers();
Reflection.ensureMemberAccess(caller, this, null, modifiers);
newInstanceCallerCache = caller;
}
As you can see, Java 8 had a quickCheckMemberAccess
which allowed to bypass the expensive operations, like Reflection.getCallerClass()
. This quick check has been removed, I’d guess, because it wasn’t compatible with the new module access rules.
But there’s more to it. The JVM might optimize reflective instantiations with a predictable type and Something.class.newInstance()
refers to a perfectly predictable type. This optimization might have become less effective. There are several possible reasons:
- the new module access rules complicate the process
- since
Class.newInstance()
has been deprecated, some support has been deliberately removed (seems unlikely to me)
- due to the changed implementation code shown above, HotSpot fails to recognize certain code patterns that trigger the optimizations