I wanted to do some performance measurements and comparisons of simple for loops and equivalent streams implementations. I believe it's the case that streams will be somewhat slower than equivalent non-streams code, but I wanted to be sure I'm measuring the right things.
I'm including my entire jmh class here.
import java.util.ArrayList;
import java.util.List;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
@State(Scope.Benchmark)
public class MyBenchmark {
List<String> shortLengthListConstantSize = null;
List<String> mediumLengthListConstantSize = null;
List<String> longerLengthListConstantSize = null;
List<String> longLengthListConstantSize = null;
@Setup
public void setup() {
shortLengthListConstantSize = populateList(2);
mediumLengthListConstantSize = populateList(12);
longerLengthListConstantSize = populateList(300);
longLengthListConstantSize = populateList(300000);
}
private List<String> populateList(int size) {
List<String> list = new ArrayList<>();
for (int ctr = 0; ctr < size; ++ ctr) {
list.add("xxx");
}
return list;
}
@Benchmark
public long shortLengthConstantSizeFor() {
long count = 0;
for (String val : shortLengthListConstantSize) {
if (val.length() == 3) { ++ count; }
}
return count;
}
@Benchmark
public long shortLengthConstantSizeForEach() {
IntHolder intHolder = new IntHolder();
shortLengthListConstantSize.forEach(s -> { if (s.length() == 3) ++ intHolder.value; } );
return intHolder.value;
}
@Benchmark
public long shortLengthConstantSizeLambda() {
return shortLengthListConstantSize.stream().filter(s -> s.length() == 3).count();
}
@Benchmark
public long shortLengthConstantSizeLambdaParallel() {
return shortLengthListConstantSize.stream().parallel().filter(s -> s.length() == 3).count();
}
@Benchmark
public long mediumLengthConstantSizeFor() {
long count = 0;
for (String val : mediumLengthListConstantSize) {
if (val.length() == 3) { ++ count; }
}
return count;
}
@Benchmark
public long mediumLengthConstantSizeForEach() {
IntHolder intHolder = new IntHolder();
mediumLengthListConstantSize.forEach(s -> { if (s.length() == 3) ++ intHolder.value; } );
return intHolder.value;
}
@Benchmark
public long mediumLengthConstantSizeLambda() {
return mediumLengthListConstantSize.stream().filter(s -> s.length() == 3).count();
}
@Benchmark
public long mediumLengthConstantSizeLambdaParallel() {
return mediumLengthListConstantSize.stream().parallel().filter(s -> s.length() == 3).count();
}
@Benchmark
public long longerLengthConstantSizeFor() {
long count = 0;
for (String val : longerLengthListConstantSize) {
if (val.length() == 3) { ++ count; }
}
return count;
}
@Benchmark
public long longerLengthConstantSizeForEach() {
IntHolder intHolder = new IntHolder();
longerLengthListConstantSize.forEach(s -> { if (s.length() == 3) ++ intHolder.value; } );
return intHolder.value;
}
@Benchmark
public long longerLengthConstantSizeLambda() {
return longerLengthListConstantSize.stream().filter(s -> s.length() == 3).count();
}
@Benchmark
public long longerLengthConstantSizeLambdaParallel() {
return longerLengthListConstantSize.stream().parallel().filter(s -> s.length() == 3).count();
}
@Benchmark
public long longLengthConstantSizeFor() {
long count = 0;
for (String val : longLengthListConstantSize) {
if (val.length() == 3) { ++ count; }
}
return count;
}
@Benchmark
public long longLengthConstantSizeForEach() {
IntHolder intHolder = new IntHolder();
longLengthListConstantSize.forEach(s -> { if (s.length() == 3) ++ intHolder.value; } );
return intHolder.value;
}
@Benchmark
public long longLengthConstantSizeLambda() {
return longLengthListConstantSize.stream().filter(s -> s.length() == 3).count();
}
@Benchmark
public long longLengthConstantSizeLambdaParallel() {
return longLengthListConstantSize.stream().parallel().filter(s -> s.length() == 3).count();
}
public static class IntHolder {
public int value = 0;
}
}
I'm running these on a Win7 laptop. I don't care about absolute measurements, just relative. Here are the latest results from these:
Benchmark Mode Cnt Score Error Units
MyBenchmark.longLengthConstantSizeFor thrpt 200 2984.554 ± 57.557 ops/s
MyBenchmark.longLengthConstantSizeForEach thrpt 200 2971.701 ± 110.414 ops/s
MyBenchmark.longLengthConstantSizeLambda thrpt 200 331.741 ± 2.196 ops/s
MyBenchmark.longLengthConstantSizeLambdaParallel thrpt 200 2827.695 ± 682.662 ops/s
MyBenchmark.longerLengthConstantSizeFor thrpt 200 3551842.518 ± 42612.744 ops/s
MyBenchmark.longerLengthConstantSizeForEach thrpt 200 3616285.629 ± 16335.379 ops/s
MyBenchmark.longerLengthConstantSizeLambda thrpt 200 2791292.093 ± 12207.302 ops/s
MyBenchmark.longerLengthConstantSizeLambdaParallel thrpt 200 50278.869 ± 1977.648 ops/s
MyBenchmark.mediumLengthConstantSizeFor thrpt 200 55447999.297 ± 277442.812 ops/s
MyBenchmark.mediumLengthConstantSizeForEach thrpt 200 57381287.954 ± 362751.975 ops/s
MyBenchmark.mediumLengthConstantSizeLambda thrpt 200 15925281.039 ± 65707.093 ops/s
MyBenchmark.mediumLengthConstantSizeLambdaParallel thrpt 200 60082.495 ± 581.405 ops/s
MyBenchmark.shortLengthConstantSizeFor thrpt 200 132278188.475 ± 1132184.820 ops/s
MyBenchmark.shortLengthConstantSizeForEach thrpt 200 124158664.044 ± 1112991.883 ops/s
MyBenchmark.shortLengthConstantSizeLambda thrpt 200 18750818.019 ± 171239.562 ops/s
MyBenchmark.shortLengthConstantSizeLambdaParallel thrpt 200 474054.951 ± 1344.705 ops/s
In an earlier question, I confirmed that these benchmarks appear to be "functionally equivalent" (just looking for additional eyes). Do these numbers appear to be in line, perhaps with independent runs of these benchmarks?
Another thing that I've always been uncertain about with JMH output, is determining exactly what the throughput numbers represent. For instance, what does the "200" in the "Cnt" column exactly represent? The throughput units are in "operations per second", so what exactly does the "operation" represent, is that the execution of one call to the benchmark method? For instance, in the last row, that would represent 474k executions of the benchmark method in a second.
Update:
I note that when I compare the "for" with the "lambda", starting with the "short" list and going to longer lists, the ratio between them is pretty large, but decreases, until the "long" list, where the ratio is even larger than for the "short" list (14%, 29%, 78%, and 11%). I find this surprising. I would have expected the ratio of the streams overhead to decrease as the work in the actual business logic increases. Anyone have any thoughts on that?
The
cnt
column is the number of iterations - i.e. how many times a tests is repeated. You can control that value using the following annotations:@Measurement(iterations = 10, time = 50, timeUnit = TimeUnit.MILLISECONDS)
@Warmup(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
Here
iterations
iscnt
;time
is the required duration of one iteration, andtimeUnit
is the unit of measurement of thetime
value.You can control the output in several ways. For instance you can change the unit of measurement for the time using
@OutputTimeUnit(TimeUnit.XXXX)
, so you can get ops/us, ops/msYou can also change the
mode
: instead of measureing ops/time you can measure "average time", "sample time", etc. You can control this via the@BenchmarkMode({Mode.AverageTime})
annotationSo lets say one iteration is 1 second long and you get 1000 ops/sec. This means that the benchamrk method has been executed 1000 times.
In other words one operation is one execution of the benchmark method, unless you have the
@OperationsPerInvocation(XXX)
annotation, which means tha teach invocation of the methods will count as XXX operations.The error is calculated across all iterations.
One more tip: instead of hardcoding each possible size, you can do a parameterized benchmark:
Then you can use that param in your setup: