The Java API documentations states that the combiner
parameter of the collect
method must be:
an associative, non-interfering, stateless function for combining two values, which must be compatible with the accumulator function
A combiner
is a BiConsumer<R,R>
that receives two parameters of type R
and returns void
. But the documentation does not state if we should combine the elements into the first or the second parameter?
For instance the following examples may give different results, depending on the order of combination be: m1.addAll(m2)
or m2.addAll(m1)
.
List<String> res = LongStream
.rangeClosed(1, 1_000_000)
.parallel()
.mapToObj(n -> "" + n)
.collect(ArrayList::new, ArrayList::add,(m1, m2) -> m1.addAll(m2));
I know that in this case we could simply use a method handle, such as ArrayList::addAll
. Yet, there are some cases where it is required a Lambda and we must combine the items in the correct order, otherwise we could get an inconsistent result when processing in parallel.
Is this claimed in any part of the Java 8 API documentation? Or it really doesn't matter?
Seems that this is not explicitly stated in the documentation. However there's an ordering concept in streams API. Stream can be either ordered or not. It may be unordered from the very beginning if source spliterator is unordered (for example, if the stream source is HashSet
). Or the stream may become unordered if user explicitly used unordered()
operation. If the stream is ordered, then collection procedure should also be stable, thus, I guess, it's assumed that for ordered streams the combiner
receives the arguments in the strict order. However it's not guaranteed for an unordered stream.
Of course, it matters, as when you use m2.addAll(m1)
instead of m1.addAll(m2)
, it doesn’t just change the order of elements, but completely breaks the operation. Since a BiConsumer
doesn’t return a result, you have no control over which object the caller will use as the result and since the caller will use the first one, modifying the second instead will cause data loss.
There is a hint if you look at the accumulator function which has the type BiConsumer<R,? super T>
, in other words can’t do anything else than storing the element of type T
, provided as second argument, into the container of type R
, provided as first argument.
If you look at the documentation of Collector
, which uses a BinaryOperator
as combiner function, hence allows the combiner to decide which argument to return (or even an entirely different result instance), you find:
The associativity constraint says that splitting the computation must produce an equivalent result. That is, for any input elements t1
and t2
, the results r1
and r2
in the computation below must be equivalent:
A a1 = supplier.get();
accumulator.accept(a1, t1);
accumulator.accept(a1, t2);
R r1 = finisher.apply(a1); // result without splitting
A a2 = supplier.get();
accumulator.accept(a2, t1);
A a3 = supplier.get();
accumulator.accept(a3, t2);
R r2 = finisher.apply(combiner.apply(a2, a3)); // result with splitting
So if we assume that the accumulator is applied in encounter order, the combiner has to combine the first and second argument in left-to-right order to produce an equivalent result.
Now, the three-arg version of Stream.collect
has a slightly different signature, using a BiConsumer
as combiner exactly for supporting method references like ArrayList::addAll
. Assuming consistency throughout all these operations and considering the purpose of this signature change, we can safely assume that it has to be the first argument which is the container to modify.
But it seems that this is a late change and the documentation hasn’t adapted accordingly. If you look at the Mutable reduction section of the package documentation, you will find that it has been adapted to show the actual Stream.collect
’s signature and usage examples, but repeats exactly the same definition regarding the associativity constraint as shown above, despite the fact that finisher.apply(combiner.apply(a2, a3))
doesn’t work if combiner
is a BiConsumer
…
The documentation issue has been reported as JDK-8164691 and addressed in Java 9. The new documentation says:
combiner - an associative, non-interfering, stateless function that accepts two partial result containers and merges them, which must be compatible with the accumulator function. The combiner function must fold the elements from the second result container into the first result container.