I ran some JMH-tests on lambda vs method reference, looking similar to:
IntStream......reduce(Integer::max)
vs.
IntSream.......reduce((i1, i2) -> Integer.max(i1, i2))
What I noticed was that the method reference performed about 5 times as fast as compared to the lambda, in Java 8.
When i ran the test in Java 11 the execution time of the both approaches were about as fast as the method reference was in Java 8. So no major difference in performance between lambda and method reference in Java 11.
My question is: What improvement(s) have been made from Java 8 to 11 to boost this performance?
I'm using OpenJDK.
EDIT
My benchmark:
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
@Fork(value = 1, jvmArgs = {"-XX:CompileThreshold=5000"})
@Warmup(iterations = 2)
public class FindMaxInt {
@Param({"10000", "1000000", "10000000"})
private int n;
private List<Integer> data;
@Setup
public void setup(){
data = createData();
}
@Benchmark
public void streamWithMethodReference(final Blackhole blackhole){
int max = data.stream().mapToInt(Integer::intValue).reduce(Integer.MIN_VALUE, Integer::max);
blackhole.consume(max);
}
@Benchmark
public void streamWithLambda(final Blackhole blackhole){
int max = data.stream().mapToInt(Integer::intValue).reduce(Integer.MIN_VALUE, (i1, i2) -> Integer.max(i1, i2));
blackhole.consume(max);
}
Here is a combination of effects described in this and this answers.
Different results are explained by a different inlining tree. Lambda has one more level of indirection comparing to method reference, so during JIT compilation the expression with lambda may reach the inlining depth limit earlier. The default is -XX:MaxInlineLevel=9
.
Run the benchmark with -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining
to see the the whole inlining tree. Here is what we get on JDK 8:
1563 560 4 bench.FindMaxInt::streamWithLambda (38 bytes)
@ 3 java.util.stream.IntPipeline::<init> (7 bytes) inline (hot)
@ 3 java.util.stream.AbstractPipeline::<init> (91 bytes) inline (hot)
@ 1 java.util.stream.PipelineHelper::<init> (5 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 51 java.util.stream.StreamOpFlag::combineOpFlags (9 bytes) inline (hot)
@ 2 java.util.stream.StreamOpFlag::getMask (30 bytes) inline (hot)
@ 66 java.util.stream.IntPipeline$StatelessOp::opIsStateful (2 bytes) inline (hot)
@ 4 java.util.Collection::stream (11 bytes) inline (hot)
\-> TypeProfile (5120/5120 counts) = java/util/ArrayList
@ 1 java.util.ArrayList::spliterator (12 bytes) inline (hot)
@ 8 java.util.ArrayList$ArrayListSpliterator::<init> (26 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 7 java.util.stream.StreamSupport::stream (19 bytes) inline (hot)
@ 1 java.util.Objects::requireNonNull (14 bytes) inline (hot)
@ 11 java.util.stream.StreamOpFlag::fromCharacteristics (37 bytes) inline (hot)
@ 1 java.util.ArrayList$ArrayListSpliterator::characteristics (4 bytes) inline (hot)
\-> TypeProfile (5124/5124 counts) = java/util/ArrayList$ArrayListSpliterator
@ 15 java.util.stream.ReferencePipeline$Head::<init> (8 bytes) inline (hot)
@ 4 java.util.stream.ReferencePipeline::<init> (8 bytes) inline (hot)
@ 4 java.util.stream.AbstractPipeline::<init> (55 bytes) inline (hot)
@ 1 java.util.stream.PipelineHelper::<init> (5 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 9 java.lang.invoke.LambdaForm$MH/883049899::linkToTargetMethod (8 bytes) force inline by annotation
@ 4 java.lang.invoke.LambdaForm$MH/1922154895::identity_L (8 bytes) force inline by annotation
@ 14 java.util.stream.ReferencePipeline::mapToInt (26 bytes) inline (hot)
\-> TypeProfile (5120/5120 counts) = java/util/stream/ReferencePipeline$Head
@ 1 java.util.Objects::requireNonNull (14 bytes) inline (hot)
@ 22 java.util.stream.ReferencePipeline$4::<init> (20 bytes) inline (hot)
@ 16 java.util.stream.IntPipeline$StatelessOp::<init> (29 bytes) inline (hot)
@ 3 java.util.stream.IntPipeline::<init> (7 bytes) inline (hot)
@ 3 java.util.stream.AbstractPipeline::<init> (91 bytes) inline (hot)
@ 1 java.util.stream.PipelineHelper::<init> (5 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 51 java.util.stream.StreamOpFlag::combineOpFlags (9 bytes) inline (hot)
@ 2 java.util.stream.StreamOpFlag::getMask (30 bytes) inline (hot)
@ 66 java.util.stream.IntPipeline$StatelessOp::opIsStateful (2 bytes) inline (hot)
@ 21 java.lang.invoke.LambdaForm$MH/883049899::linkToTargetMethod (8 bytes) force inline by annotation
@ 4 java.lang.invoke.LambdaForm$MH/1922154895::identity_L (8 bytes) force inline by annotation
@ 26 java.util.stream.IntPipeline::reduce (16 bytes) inline (hot)
\-> TypeProfile (5120/5120 counts) = java/util/stream/ReferencePipeline$4
@ 3 java.util.stream.ReduceOps::makeInt (18 bytes) inline (hot)
@ 1 java.util.Objects::requireNonNull (14 bytes) inline (hot)
@ 14 java.util.stream.ReduceOps$5::<init> (16 bytes) inline (hot)
@ 12 java.util.stream.ReduceOps$ReduceOp::<init> (10 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 6 java.util.stream.AbstractPipeline::evaluate (94 bytes) inline (hot)
@ 50 java.util.stream.AbstractPipeline::isParallel (8 bytes) inline (hot)
@ 80 java.util.stream.TerminalOp::getOpFlags (2 bytes) inline (hot)
\-> TypeProfile (5130/5130 counts) = java/util/stream/ReduceOps$5
@ 85 java.util.stream.AbstractPipeline::sourceSpliterator (265 bytes) inline (hot)
@ 79 java.util.stream.AbstractPipeline::isParallel (8 bytes) inline (hot)
@ 88 java.util.stream.ReduceOps$ReduceOp::evaluateSequential (18 bytes) inline (hot)
@ 2 java.util.stream.ReduceOps$5::makeSink (5 bytes) inline (hot)
@ 1 java.util.stream.ReduceOps$5::makeSink (16 bytes) inline (hot)
@ 12 java.util.stream.ReduceOps$5ReducingSink::<init> (15 bytes) inline (hot)
@ 11 java.lang.Object::<init> (1 bytes) inline (hot)
@ 6 java.util.stream.AbstractPipeline::wrapAndCopyInto (18 bytes) inline (hot)
@ 3 java.util.Objects::requireNonNull (14 bytes) inline (hot)
@ 9 java.util.stream.AbstractPipeline::wrapSink (37 bytes) inline (hot)
@ 1 java.util.Objects::requireNonNull (14 bytes) inline (hot)
@ 23 java.util.stream.ReferencePipeline$4::opWrapSink (10 bytes) inline (hot)
\-> TypeProfile (5081/5081 counts) = java/util/stream/ReferencePipeline$4
@ 6 java.util.stream.ReferencePipeline$4$1::<init> (11 bytes) inline (hot)
@ 7 java.util.stream.Sink$ChainedReference::<init> (16 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 6 java.util.Objects::requireNonNull (14 bytes) inline (hot)
@ 13 java.util.stream.AbstractPipeline::copyInto (53 bytes) inline (hot)
@ 1 java.util.Objects::requireNonNull (14 bytes) inline (hot)
@ 9 java.util.stream.AbstractPipeline::getStreamAndOpFlags (5 bytes) accessor
@ 12 java.util.stream.StreamOpFlag::isKnown (19 bytes) inline (hot)
@ 20 java.util.Spliterator::getExactSizeIfKnown (25 bytes) inline (hot)
\-> TypeProfile (5081/5081 counts) = java/util/ArrayList$ArrayListSpliterator
@ 1 java.util.ArrayList$ArrayListSpliterator::characteristics (4 bytes) inline (hot)
@ 19 java.util.ArrayList$ArrayListSpliterator::estimateSize (11 bytes) inline (hot)
@ 1 java.util.ArrayList$ArrayListSpliterator::getFence (48 bytes) inline (hot)
@ 38 java.util.ArrayList::access$000 (5 bytes) accessor
@ 25 java.util.stream.Sink$ChainedReference::begin (11 bytes) inline (hot)
\-> TypeProfile (5081/5081 counts) = java/util/stream/ReferencePipeline$4$1
@ 5 java.util.stream.ReduceOps$5ReducingSink::begin (9 bytes) inline (hot)
\-> TypeProfile (5079/5079 counts) = java/util/stream/ReduceOps$5ReducingSink
@ 32 java.util.ArrayList$ArrayListSpliterator::forEachRemaining (129 bytes) inline (hot)
@ 51 java.util.ArrayList::access$000 (5 bytes) accessor
@ 99 java.util.stream.ReferencePipeline$4$1::accept (23 bytes) inline (hot)
@ 12 bench.FindMaxInt$$Lambda$8/390011259::applyAsInt (8 bytes) inline (hot)
\-> TypeProfile (13752/13752 counts) = bench/FindMaxInt$$Lambda$8
@ 4 java.lang.Integer::intValue (5 bytes) accessor
@ 17 java.util.stream.ReduceOps$5ReducingSink::accept (19 bytes) inline (hot)
\-> TypeProfile (13752/13752 counts) = java/util/stream/ReduceOps$5ReducingSink
@ 10 bench.FindMaxInt$$Lambda$9/208515840::applyAsInt (6 bytes) inline (hot)
\-> TypeProfile (9107/9107 counts) = bench/FindMaxInt$$Lambda$9
@ 2 bench.FindMaxInt::lambda$streamWithLambda$0 (6 bytes) inline (hot)
@ 2 java.lang.Integer::max (6 bytes) inlining too deep
@ 38 java.util.stream.Sink$ChainedReference::end (10 bytes) inline (hot)
@ 4 java.util.stream.Sink::end (1 bytes) inline (hot)
\-> TypeProfile (5125/5125 counts) = java/util/stream/ReduceOps$5ReducingSink
@ 12 java.util.stream.ReduceOps$5ReducingSink::get (5 bytes) inline (hot)
@ 1 java.util.stream.ReduceOps$5ReducingSink::get (8 bytes) inline (hot)
@ 4 java.lang.Integer::valueOf (32 bytes) inline (hot)
@ 28 java.lang.Integer::<init> (10 bytes) inline (hot)
@ 1 java.lang.Number::<init> (5 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 12 java.lang.Integer::intValue (5 bytes) accessor
@ 34 org.openjdk.jmh.infra.Blackhole::consume (28 bytes) disallowed by CompilerOracle
The key lines are the following. They mean the inlining breaks exactly at the final call to Integer.max
, because the default limit of 9 levels is reached.
@ 2 bench.FindMaxInt::lambda$streamWithLambda$0 (6 bytes) inline (hot)
@ 2 java.lang.Integer::max (6 bytes) inlining too deep
The shape of the inlining tree is very different on JDK 11:
1588 705 4 bench.FindMaxInt::streamWithLambda (38 bytes)
@ 4 java.util.Collection::stream (11 bytes) inline (hot)
\-> TypeProfile (5263/5263 counts) = java/util/ArrayList
@ 1 java.util.ArrayList::spliterator (12 bytes) inline (hot)
@ 8 java.util.ArrayList$ArrayListSpliterator::<init> (26 bytes) inline (hot)
@ 6 java.lang.Object::<init> (1 bytes) inline (hot)
@ 7 java.util.stream.StreamSupport::stream (19 bytes) inline (hot)
@ 1 java.util.Objects::requireNonNull (14 bytes) inline (hot)
@ 11 java.util.stream.StreamOpFlag::fromCharacteristics (37 bytes) inline (hot)
@ 1 java.util.ArrayList$ArrayListSpliterator::characteristics (4 bytes) inline (hot)
\-> TypeProfile (5125/5125 counts) = java/util/ArrayList$ArrayListSpliterator
@ 15 java.util.stream.ReferencePipeline$Head::<init> (8 bytes) inline (hot)
@ 4 java.util.stream.ReferencePipeline::<init> (8 bytes) inline (hot)
@ 4 java.util.stream.AbstractPipeline::<init> (55 bytes) inline (hot)
@ 1 java.util.stream.PipelineHelper::<init> (5 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 9 java.lang.invoke.Invokers$Holder::linkToTargetMethod (8 bytes) force inline by annotation
@ 4 java.lang.invoke.LambdaForm$MH/0x0000000800060440::invoke (8 bytes) force inline by annotation
@ 14 java.util.stream.ReferencePipeline::mapToInt (26 bytes) inline (hot)
\-> TypeProfile (5263/5263 counts) = java/util/stream/ReferencePipeline$Head
@ 1 java.util.Objects::requireNonNull (14 bytes) inline (hot)
@ 22 java.util.stream.ReferencePipeline$4::<init> (20 bytes) inline (hot)
@ 16 java.util.stream.IntPipeline$StatelessOp::<init> (29 bytes) inline (hot)
@ 3 java.util.stream.IntPipeline::<init> (7 bytes) inline (hot)
@ 3 java.util.stream.AbstractPipeline::<init> (91 bytes) inline (hot)
@ 1 java.util.stream.PipelineHelper::<init> (5 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 51 java.util.stream.StreamOpFlag::combineOpFlags (9 bytes) inline (hot)
@ 2 java.util.stream.StreamOpFlag::getMask (30 bytes) inline (hot)
@ 66 java.util.stream.IntPipeline$StatelessOp::opIsStateful (2 bytes) inline (hot)
@ 21 java.lang.invoke.Invokers$Holder::linkToTargetMethod (8 bytes) force inline by annotation
@ 4 java.lang.invoke.LambdaForm$MH/0x0000000800060440::invoke (8 bytes) force inline by annotation
@ 26 java.util.stream.IntPipeline::reduce (16 bytes) inline (hot)
\-> TypeProfile (5263/5263 counts) = java/util/stream/ReferencePipeline$4
@ 3 java.util.stream.ReduceOps::makeInt (18 bytes) inline (hot)
@ 1 java.util.Objects::requireNonNull (14 bytes) inline (hot)
@ 14 java.util.stream.ReduceOps$6::<init> (16 bytes) inline (hot)
@ 12 java.util.stream.ReduceOps$ReduceOp::<init> (10 bytes) inline (hot)
@ 1 java.lang.Object::<init> (1 bytes) inline (hot)
@ 6 java.util.stream.AbstractPipeline::evaluate (94 bytes) inline (hot)
@ 50 java.util.stream.AbstractPipeline::isParallel (8 bytes) inline (hot)
@ 80 java.util.stream.TerminalOp::getOpFlags (2 bytes) inline (hot)
\-> TypeProfile (5362/5362 counts) = java/util/stream/ReduceOps$6
@ 85 java.util.stream.AbstractPipeline::sourceSpliterator (265 bytes) inline (hot)
@ 79 java.util.stream.AbstractPipeline::isParallel (8 bytes) inline (hot)
@ 88 java.util.stream.ReduceOps$ReduceOp::evaluateSequential (18 bytes) already compiled into a big method
@ 12 java.lang.Integer::intValue (5 bytes) accessor
@ 34 org.openjdk.jmh.infra.Blackhole::consume (28 bytes) disallowed by CompileCommand
The compilation tree cuts off much earlier due to a different reason:
@ 88 java.util.stream.ReduceOps$ReduceOp::evaluateSequential (18 bytes) already compiled into a big method
The default garbage collector has changed to G1 in JDK 11. The compiled code appears larger due to G1 barriers, that's why the inlining heuristics prevented the hottest forEachRemaining
loop from inlining into the streamWithLambda
method.
In fact, this is not an optimization in JDK 11, but more like the other way round. However, the overall performance of this particular benchmark appeared better, since the inlining tree cutoff happened outside the hottest loop.