Java 8 Stream API - Does any stateful intermediate

2020-05-25 07:05发布

问题:

Is the following statement true?

The sorted() operation is a “stateful intermediate operation”, which means that subsequent operations no longer operate on the backing collection, but on an internal state.

(Source and source - they seem to copy from each other or come from the same source.)

I have tested Stream::sorted as a snippet from sources above:

final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());

list.stream()
    .filter(i -> i > 5)
    .sorted()
    .forEach(list::remove);

System.out.println(list);            // Prints [0, 1, 2, 3, 4, 5]

It works. I replaced Stream::sorted with Stream::distinct, Stream::limit and Stream::skip:

final List<Integer> list = IntStream.range(0, 10).boxed().collect(Collectors.toList());

list.stream()
    .filter(i -> i > 5)
    .distinct()
    .forEach(list::remove);          // Throws NullPointerException

To my surprise, the NullPointerException is thrown.

All the tested methods follow the stateful intermediate operation characteristics. Yet, this unique behavior of Stream::sorted is not documented nor the Stream operations and pipelines part explains whether the stateful intermediate operations really guarantee a new source collection.

Where my confusion comes from and what is the explanation of the behavior above?

回答1:

The API documentation makes no such guarantee “that subsequent operations no longer operate on the backing collection”, hence, you should never rely on such a behavior of a particular implementation.

Your example happens to do the desired thing by accident; there’s not even a guarantee that the List created by collect(Collectors.toList()) supports the remove operation.

To show a counter-example

Set<Integer> set = IntStream.range(0, 10).boxed()
    .collect(Collectors.toCollection(TreeSet::new));
set.stream()
    .filter(i -> i > 5)
    .sorted()
    .forEach(set::remove);

throws a ConcurrentModificationException. The reason is that the implementation optimizes this scenario, as the source is already sorted. In principle, it could do the same optimization to your original example, as forEach is explicitly performing the action in no specified order, hence, the sorting is unnecessary.

There are other optimizations imaginable, e.g. sorted().findFirst() could get converted to a “find the minimum” operation, without the need to copy the element into a new storage for sorting.

So the bottom line is, when relying on unspecified behavior, what may happen to work today, may break tomorrow, when new optimizations are added.



回答2:

Well sorted has to be a full copying barrier for the stream pipeline, after all your source could be not sorted; but this is not documented as such, thus do not rely on it.

This is not just about sorted per-se, but what other optimization can be done to the stream pipeline, so that sorted could be entirely skipped. For example:

List<Integer> sortedList = IntStream.range(0, 10)
            .boxed()
            .collect(Collectors.toList());

    StreamSupport.stream(() -> sortedList.spliterator(), Spliterator.SORTED, false)
            .sorted()
            .forEach(sortedList::remove); // fails with CME, thus no copying occurred 

Of course, sorted needs to be a full barrier and stop to do an entire sort, unless, of course, it can be skipped, thus the documentation makes no such promises, so that we don't run in weird surprises.

distinct on the other hand does not have to be a full barrier, all distinct does is check one element at a time, if it is unique; so after a single element is checked (and it is unique) it is passed to the next stage, thus without being a full barrier. Either way, this is not documented also...



回答3:

You shouldn't have brought up the cases with a terminal operation forEach(list::remove) because list::remove is an interfering function and it violates the "non-interference" principle for terminal actions.

It's vital to follow the rules before wondering why an incorrect code snippet causes unexpected (or undocumented) behaviour.

I believe that list::remove is the root of the problem here. You wouldn't have noticed the difference between the operations for this scenario if you'd written a proper action for forEach.