What's the better way to add elements from a S

2019-07-12 13:10发布

问题:

I have to write some code that adds the content of a Java 8 Stream to a List multiple times, and I'm having trouble figuring out what's the best way to do it. Based on what I read on SO (mainly this question: How to add elements of a Java8 stream into an existing List) and elsewhere, I've narrowed it down to the following options:

import java.util.ArrayList;
import java.util.List;
import java.util.function.Function;
import java.util.stream.Collectors;

public class Accumulator<S, T> {


    private final Function<S, T> transformation;
    private final List<T> internalList = new ArrayList<T>();

    public Accumulator(Function<S, T> transformation) {
        this.transformation = transformation;
    }

    public void option1(List<S> newBatch) {
        internalList.addAll(newBatch.stream().map(transformation).collect(Collectors.toList()));
    }

    public void option2(List<S> newBatch) {
        newBatch.stream().map(transformation).forEach(internalList::add);
    }
}

The idea is that the methods would be called multiple times for the same instance of Accumulator. The choice is between using an intermediate list and callingCollection.addAll() once outside of the stream or calling collection.add() from the stream for each element.

I would tend to prefer option 2 which is more in the spirit of functional programming, and avoid creating an intermediate list, however, there might be benefits to calling addAll() instead of calling add() n times when n is large.

Is one of the two options significantly better than the other ?

EDIT: JB Nizet has a very cool answer that delays the transformation until all batches have been added. In my case, it is required that the transformation is performed straight-away.

PS: In my example code, I've used transformation as a placeholder for whatever operations which need to be performed on the stream

回答1:

First of all, your second variant should be:

public void option2(List<S> newBatch) {
  newBatch.stream().map(transformation).forEachOrdered(internalList::add);
}

to be correct.

Besides that, the advantage of addAll in

public void option1(List<S> newBatch) {
  internalList.addAll(newBatch.stream().map(transformation).collect(Collectors.toList()));
}

is moot as the Collector API does not allow the Stream to provide hints about the expected size to the Collector and requires the Stream to evaluate the accumulator function for every element, which is nothing else than ArrayList::add in the current implementation.

So before this approach could get any benefit from addAll, it filled an ArrayList by repeatedly calling add on an ArrayList, including potential capacity increase operations. So you can stay with option2 without regret.

An alternative is to use a stream builder for temporary collections:

public class Accumulator<S, T> {
    private final Function<S, T> transformation;
    private final Stream.Builder<T> internal = Stream.builder();

    public Accumulator(Function<S, T> transformation) {
        this.transformation = transformation;
    }

    public void addBatch(List<S> newBatch) {
        newBatch.stream().map(transformation).forEachOrdered(internal);
    }

    public List<T> finish() {
        return internal.build().collect(Collectors.toList());
    }
}

The stream builder uses a spined buffer which does not require copying the contents when increasing its capacity, but the solution still suffers from the fact that the final collection step involves filling an ArrayList without an appropriate initial capacity (in the current implementation).

With the current implementation, it’s far more efficient to implement the finishing step as

public List<T> finish() {
    return Arrays.asList(internal.build().toArray(…));
}

But this requires either, an IntFunction<T[]> provided by the caller (as we can’t do that for a generic array type), or to perform an unchecked operation (pretending an Object[] to be T[], which would be ok here, but still a nasty unchecked operation).



回答2:

The best solution would be a third one, avoiding that internal list completely. Just let the stream create the final list for you:

Assuming you have a List<List<S>>, containing your N batches, on which the same transformation must be applied, you would do

List<T> result = 
    batches.stream()
           .flatMap(batch -> batch.stream())
           .map(transformation)
           .collect(Collectors.toList());