If the Stream s
below has at least n
elements, what are the situations where the stream sLimit
may have less than n
elements, if any?
Stream sLimit = s.limit(n);
Reason for the question: in this answer, I read that:
Despite the appearances, using limit(10)
doesn't necessarily result in a SIZED
stream with exactly 10 elements -- it might have fewer.
I think Holger's and Sotirios' answers are accurate, but inasmuch as I'm the guy who made the statement, I guess I should explain myself.
I'm mainly talking about spliterator characteristics, in particular the SIZED
characteristic. This is basically "static" information about the stream stages that is known at pipeline setup time, but before the stream actually executes. Indeed, it's used for determining the execution strategy for the stream, so it has to be known before the stream executes.
The limit()
operation creates a spliterator that wraps its upstream spliterator, so the limit
spliterator needs to determine what characteristics to return. Even if its upstream spliterator is SIZED
, it doesn't know the exact size, so it has to turn off the SIZED
characteristic.
So if you, the programmer, were to write:
IntStream.range(0, 100).limit(10)
you'd say of course that stream has exactly 10 elements. (And it will.) But the resulting spliterator is still not SIZED
. After all, the limit
operator doesn't know the difference between the above and this:
IntStream.range(0, 1).limit(10)
at least in terms of spliterator characteristics.
So that's why, even though there are times when it seems like it ought to, the limit
operator doesn't return a stream of known size. This in turn affects the splitting strategy, which impacts parallel efficiency.
You misunderstood the statement. If the Stream
has at least n
elements and you invoke limit(n)
on it, it will have exactly n
elements but the Stream
implementation might not be aware of it and hence have a less than optimal performance.
In contrast, certain Stream
sources (Spliterator
s) know for sure that they have a fixed size, e.g. when creating a Stream
for an array or an IntStream
via IntStream.range
. They can be optimized better than a Stream
with a limit(n)
.
When you create a parallel
Stream
via Stream.generate(MyClass::new).limit(10)
, the constructor will still be invoked sequentially and only follow-up operations might run in parallel. In contrast, when using IntStream.range(0, n).mapToObj(i -> new MyClass())
, the entire Stream
operation, including the constructor calls, can run in parallel.