Why is the StringBuilder chaining pattern sb.appen

I have a microbenchmark that shows very strange results:

@BenchmarkMode(Mode.Throughput)
@Fork(1)
@State(Scope.Thread)
@Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS, batchSize = 1000)
@Measurement(iterations = 40, time = 1, timeUnit = TimeUnit.SECONDS, batchSize = 1000)
public class Chaining {

    private String a1 = "111111111111111111111111";
    private String a2 = "222222222222222222222222";
    private String a3 = "333333333333333333333333";

    @Benchmark
    public String typicalChaining() {
        return new StringBuilder().append(a1).append(a2).append(a3).toString();
    }

    @Benchmark
    public String noChaining() {
        StringBuilder sb = new StringBuilder();
        sb.append(a1);
        sb.append(a2);
        sb.append(a3);
        return sb.toString();
    }
}

I'm expecting the results of both tests to be the same or at least very close. However, the difference is almost 5x:

# Run complete. Total time: 00:01:41

Benchmark                  Mode  Cnt      Score     Error  Units
Chaining.noChaining       thrpt   40   8538.236 ± 209.924  ops/s
Chaining.typicalChaining  thrpt   40  36729.523 ± 988.936  ops/s

Does anybody know how that is possible?

String concatenation a + b + c is a very frequent pattern in Java programs, so HotSpot JVM has a special optimization for it: -XX:+OptimizeStringConcat which is ON by default.

HotSpot JVM recognizes new StringBuilder().append()...append().toString() pattern in the bytecode and translates it to the optimized machine code without calling actual Java methods and without allocating intermediate objects. I.e. this is a kind of compound JVM intrinsic.

Here is the source code for this optimization.

On the other side, sb.append(); sb.append(); ... is not handled specially. This sequence is compiled just like a regular Java method calls.

If you rerun the benchmark with -XX:-OptimizeStringConcat, the performance will be the same for both variants.