Sorted stream derived from Infinite Stream fails t

2019-07-23 09:35发布

问题:

import java.util.stream.*;
import java.util.*;

class TestInfiniteStream {
    public static void main(String args[]) {
        IntStream infiniteStream = new Random().ints();
        IntStream sortedStream = infiniteStream.sorted();

        sortedStream.forEach(i -> System.out.println(i));
    }
}

After compiling and executing this code I get the following error.

Exception in thread "main" java.lang.IllegalArgumentException: Stream size exceeds max array size

Does sorting a stream fail on an infinite stream?

回答1:

The simple answer to “Does sorting a stream fail on an infinite stream?” is “Yes.sorted() is a stateful intermediate operation which has been implemented by buffering the entire contents and sorting it, before passing any elements to the downstream operations.

In theory, it doesn’t need to be that way. Since you are using forEach, which has been explicitly specified as processing the elements in an undefined order, the sorting step could be omitted in your new Random().ints().sorted().forEach(System.out::println); use case. But even if you used forEachOrdered, there is a theoretically achievable correct answer. Since your stream is infinite and will repeatedly contain all int values, a correct sorted output would print -2147483648 (==Integer.MIN_VALUE) forever, as that’s the smallest value that is contained infinite times in that stream.

However, to give this correct answer, the implementation would need specific code to handle this scenario, which is of not much practical value. Instead, the implementation handles this case like any other sorting of a stream scenario, which will fail for infinite streams.

In this specific case, the stream has an optimization that leads to a different, unusual exception message. As Eugene pointed out, this stream behaves like a fixed size stream of Long.MAX_VALUE (==2⁶³) elements rather than a truly infinite stream. That’s fair, considering that the stream produced by Random will repeat after 2⁴⁸ values, so the entire stream has been repeated 32768 times before it will end instead of running forever. You are unlikely to witness this “sudden” ending after processing 9223372036854775807 elements anyway. But a consequence of this optimization is that the stream will fail-fast with the “Stream size exceeds max array size” message instead of failing with an “OutOfMemoryError” after some processing.

If you eliminate the size information, e.g. via

new Random().ints().filter(x -> true).sorted().forEach(System.out::println);

the operation will try to buffer until failing with java.lang.OutOfMemoryError. The same happens with

IntStream.generate(new Random()::nextInt).sorted().forEach(System.out::println);

which provides no size information to the stream in the first place. In either case, it never gets to sort anything as the buffering happens before the sorting starts.

If you want to get “sorted runs for some limit of elements” as you said in a comment, you have to apply a limit before sorting, e.g.

new Random().ints().limit(100).sorted().forEach(System.out::println);

though it will be more efficient to still use a sized stream, e.g.

new Random().ints(100).sorted().forEach(System.out::println);


回答2:

No, you cannot sort an infinite stream.

Your infinite stream new Random().ints() produces more integers than can be stored in an array (or any array), which is used behind the scenes for storing the integers to be sorted. An array of course cannot hold an infinite number of integers; only a capacity close to Integer.MAX_VALUE numbers.

Taking a step back, how would anyone or anything sort an infinite number of integers? Nothing can and no one can; it would take at least an infinite amount of time.

Does sorting a stream fail on an infinite stream?

You've kind of answered your own question; the IllegalArgumentException is the specific cause of the failure. It only occurred because you made a program that attempted to do it, and you ran up against a Java array limitation.

The sorted method will attempt to read the entire stream before sorting anything. It won't do any intermediate sorting before it has read the entire stream, so no sorting will be done, partial or full.



回答3:

Well that is an interesting question, unfortunately your key description is in comments. First of all, there isn't really an "infinite" stream in your case, the responsible spliterator RandomIntsSpliterator even says that:

... and also by treating "infinite" as equivalent to Long.MAX_VALUE...

And now the interesting part, that spliterator will will report these characteristics:

public int characteristics() {
        return (Spliterator.SIZED | Spliterator.SUBSIZED |
                Spliterator.NONNULL | Spliterator.IMMUTABLE);
}

Well, I don't know the reasons of why, but reporting SIZED or SUBSIZED for an infinite stream... May be it does not matter either (because you usually chain these with a limit).

Well because SIZED is reported (with a size of Long.MAX_VALUE), there is a sink for this internally SizedIntSortingSink that has a check:

if (size >= Nodes.MAX_ARRAY_SIZE)
      throw new IllegalArgumentException(Nodes.BAD_SIZE);

which obviously will fail.

On the contrast IntStream.generate does not report SIZED - which makes sense to me, so the entire input has to be buffered by sorted and later handled to the terminal operation; obviously this will fail with an OutOfMemory.

It's also interesting to prove that distinct will not act as a full barrier here, waiting for all values to be processed. Instead it can pass them to the terminal operation, once it knows that this has not been seen before:

Random r = new Random();
IntStream.generate(() -> r.nextInt())
        .distinct()
        .forEach(System.out::println);

This will run for quite a while, before finally dying with a OutOfMemory. or as very nicely added by Holger in comments, it could hang too:

 IntStream.generate(() -> new Random()
          .nextInt(2))
          .distinct()
          .forEach(System.out::println);