Consider the following jmh benchmark
@State(Scope.Benchmark)
@BenchmarkMode(Array(Mode.Throughput))
class So59893913 {
def seq(xs: Seq[Int]) = xs.sum
def range(xs: Range) = xs.sum
val xs = 1 until 100000000
@Benchmark def _seq = seq(xs)
@Benchmark def _range = range(xs)
}
Given xs
references the same object of runtime class Range.Inclusive
passed in as argument to seq
and range
methods, hence dynamic dispatch should invoke the same implementation of sum
, despite differing declared static type of method parameter, why the performance seems to differ so drastically as indicated below?
sbt "jmh:run -i 10 -wi 5 -f 2 -t 1 -prof gc bench.So59893913"
[info] Benchmark Mode Cnt Score Error Units
[info] So59893913._range thrpt 20 334923591.408 ± 22126865.963 ops/s
[info] So59893913._range:·gc.alloc.rate thrpt 20 ≈ 10⁻⁴ MB/sec
[info] So59893913._range:·gc.alloc.rate.norm thrpt 20 ≈ 10⁻⁷ B/op
[info] So59893913._range:·gc.count thrpt 20 ≈ 0 counts
[info] So59893913._seq thrpt 20 193509091.399 ± 2347303.746 ops/s
[info] So59893913._seq:·gc.alloc.rate thrpt 20 2811.311 ± 34.142 MB/sec
[info] So59893913._seq:·gc.alloc.rate.norm thrpt 20 16.000 ± 0.001 B/op
[info] So59893913._seq:·gc.churn.PS_Eden_Space thrpt 20 2811.954 ± 33.656 MB/sec
[info] So59893913._seq:·gc.churn.PS_Eden_Space.norm thrpt 20 16.004 ± 0.035 B/op
[info] So59893913._seq:·gc.churn.PS_Survivor_Space thrpt 20 0.013 ± 0.005 MB/sec
[info] So59893913._seq:·gc.churn.PS_Survivor_Space.norm thrpt 20 ≈ 10⁻⁴ B/op
[info] So59893913._seq:·gc.count thrpt 20 3729.000 counts
[info] So59893913._seq:·gc.time thrpt 20 1864.000 ms
Particularly notice the difference in gc.alloc.rate
metrics.
Two things are going on.
The first is that when
xs
has the static typeRange
then that call tosum
is a monomorphic method call (becausesum
is final inRange
) and the JVM can easily inline that method and optimize it further. Whenxs
has the static typeSeq
then it becomes a megamorphic method call which won't get inlined and fully optimized.The second is that the methods that get called are not actually the same. The compiler generates two
sum
methods inRange
:The first one contains the actual implementation that you see in the source code. And as you can see it returns an unboxed
int
. The second one is this:As you see this one just calls the other
sum
method and boxes theint
into ajava.lang.Integer
.So in your method
seq
the compiler only knows about the existence of thesum
method that has return typejava.lang.Object
and calls that one. It probably doesn't get inlined and thejava.lang.Integer
that it returns has to be unboxed again soseq
can return anint
. Inrange
the compiler can generate a call to the "real"sum
method without having to box and unbox the results. The JVM can also do a better job at inlining and optimizing the code.