I have to write some code that adds the content of a Java 8 Stream to a List multiple times, and I'm having trouble figuring out what's the best way to do it. Based on what I read on SO (mainly this question: How to add elements of a Java8 stream into an existing List) and elsewhere, I've narrowed it down to the following options:
import java.util.ArrayList;
import java.util.List;
import java.util.function.Function;
import java.util.stream.Collectors;
public class Accumulator<S, T> {
private final Function<S, T> transformation;
private final List<T> internalList = new ArrayList<T>();
public Accumulator(Function<S, T> transformation) {
this.transformation = transformation;
}
public void option1(List<S> newBatch) {
internalList.addAll(newBatch.stream().map(transformation).collect(Collectors.toList()));
}
public void option2(List<S> newBatch) {
newBatch.stream().map(transformation).forEach(internalList::add);
}
}
The idea is that the methods would be called multiple times for the same instance of Accumulator
. The choice is between using an intermediate list and callingCollection.addAll()
once outside of the stream or calling collection.add()
from the stream for each element.
I would tend to prefer option 2 which is more in the spirit of functional programming, and avoid creating an intermediate list, however, there might be benefits to calling addAll()
instead of calling add()
n times when n is large.
Is one of the two options significantly better than the other ?
EDIT: JB Nizet has a very cool answer that delays the transformation until all batches have been added. In my case, it is required that the transformation is performed straight-away.
PS: In my example code, I've used transformation
as a placeholder for whatever operations which need to be performed on the stream
First of all, your second variant should be:
public void option2(List<S> newBatch) {
newBatch.stream().map(transformation).forEachOrdered(internalList::add);
}
to be correct.
Besides that, the advantage of addAll
in
public void option1(List<S> newBatch) {
internalList.addAll(newBatch.stream().map(transformation).collect(Collectors.toList()));
}
is moot as the Collector
API does not allow the Stream to provide hints about the expected size to the Collector and requires the Stream to evaluate the accumulator function for every element, which is nothing else than ArrayList::add
in the current implementation.
So before this approach could get any benefit from addAll
, it filled an ArrayList
by repeatedly calling add
on an ArrayList
, including potential capacity increase operations. So you can stay with option2
without regret.
An alternative is to use a stream builder for temporary collections:
public class Accumulator<S, T> {
private final Function<S, T> transformation;
private final Stream.Builder<T> internal = Stream.builder();
public Accumulator(Function<S, T> transformation) {
this.transformation = transformation;
}
public void addBatch(List<S> newBatch) {
newBatch.stream().map(transformation).forEachOrdered(internal);
}
public List<T> finish() {
return internal.build().collect(Collectors.toList());
}
}
The stream builder uses a spined buffer which does not require copying the contents when increasing its capacity, but the solution still suffers from the fact that the final collection step involves filling an ArrayList
without an appropriate initial capacity (in the current implementation).
With the current implementation, it’s far more efficient to implement the finishing step as
public List<T> finish() {
return Arrays.asList(internal.build().toArray(…));
}
But this requires either, an IntFunction<T[]>
provided by the caller (as we can’t do that for a generic array type), or to perform an unchecked operation (pretending an Object[]
to be T[]
, which would be ok here, but still a nasty unchecked operation).
The best solution would be a third one, avoiding that internal list completely. Just let the stream create the final list for you:
Assuming you have a List<List<S>>
, containing your N batches, on which the same transformation must be applied, you would do
List<T> result =
batches.stream()
.flatMap(batch -> batch.stream())
.map(transformation)
.collect(Collectors.toList());