Can Stream#limit return fewer elements than expect

2019-05-08 19:26发布

问题:

If the Stream s below has at least n elements, what are the situations where the stream sLimit may have less than n elements, if any?

Stream sLimit = s.limit(n);

Reason for the question: in this answer, I read that:

Despite the appearances, using limit(10) doesn't necessarily result in a SIZED stream with exactly 10 elements -- it might have fewer.

回答1:

I think Holger's and Sotirios' answers are accurate, but inasmuch as I'm the guy who made the statement, I guess I should explain myself.

I'm mainly talking about spliterator characteristics, in particular the SIZED characteristic. This is basically "static" information about the stream stages that is known at pipeline setup time, but before the stream actually executes. Indeed, it's used for determining the execution strategy for the stream, so it has to be known before the stream executes.

The limit() operation creates a spliterator that wraps its upstream spliterator, so the limit spliterator needs to determine what characteristics to return. Even if its upstream spliterator is SIZED, it doesn't know the exact size, so it has to turn off the SIZED characteristic.

So if you, the programmer, were to write:

IntStream.range(0, 100).limit(10)

you'd say of course that stream has exactly 10 elements. (And it will.) But the resulting spliterator is still not SIZED. After all, the limit operator doesn't know the difference between the above and this:

IntStream.range(0, 1).limit(10)

at least in terms of spliterator characteristics.

So that's why, even though there are times when it seems like it ought to, the limit operator doesn't return a stream of known size. This in turn affects the splitting strategy, which impacts parallel efficiency.



回答2:

You misunderstood the statement. If the Stream has at least n elements and you invoke limit(n) on it, it will have exactly n elements but the Stream implementation might not be aware of it and hence have a less than optimal performance.

In contrast, certain Stream sources (Spliterators) know for sure that they have a fixed size, e.g. when creating a Stream for an array or an IntStream via IntStream.range. They can be optimized better than a Stream with a limit(n).

When you create a parallel Stream via Stream.generate(MyClass::new).limit(10), the constructor will still be invoked sequentially and only follow-up operations might run in parallel. In contrast, when using IntStream.range(0, n).mapToObj(i -> new MyClass()), the entire Stream operation, including the constructor calls, can run in parallel.