Can a Java 8 `Stream` be parallel without you even

2019-03-24 02:55发布

问题:

As I see it, the obvious code, when using Java 8 Streams, whether they be "object" streams or primitive streams (that is, IntStream and friends) would be to just use:

someStreamableResource.stream().whatever()

But then, quite a few "streamable resources" also have .parallelStream().

What isn't clear when reading the javadoc is whether .stream() streams are always sequential, and whether .parallelStream() streams are always parallel...

And then there is Spliterator, and in particular its .characteristics(), one of them being that it can be CONCURRENT, or even IMMUTABLE.

My gut feeling is that in fact, whether a Stream can be, or not, parallel by default, or parallel at all, is guided by its underlying Spliterator...

Am I on the right track? I have read, and read again, the javadocs, and still cannot come up with a clear answer to this question...

回答1:

First, through the lens of specification. Whether a stream is parallel or sequential is part of a stream's state. Stream-creation methods should specify whether they create a sequential or parallel stream (and most in the JDK do), but they are not required to say so. If your stream source doesn't say, don't assume. If someone passes you a stream, don't assume.

Parallel streams are allowed to fall back to sequential at their discretion (since a sequential implementation is a parallel implementation, just a potentially imperfect one); the opposite is not true.

Now, through the lens of implementation. In the stream-creation methods in Collections and other JDK classes, we stick to a discipline of "create a sequential stream unless the user explicitly asks for parallelism". (Other libraries, however, make different choices. If they're polite, they'll specify their behavior.)

The relationship between stream parallelism and Spliterator only goes in one direction. A Spliterator can refuse to split -- effectively denying any parallelism -- but it can't demand that a client split it. So an uncooperative Spliterator can undermine parallelism, but not determine it.



回答2:

The API doesn't have much to say on the matter:

Streams are created with an initial choice of sequential or parallel execution. (For example, Collection.stream() creates a sequential stream, and Collection.parallelStream() creates a parallel one.)

Regarding your line of reasoning that some intermediate operations may not be thread safe, you may want to read the package summary. The package summary discusses intermediate operations, stateful vs stateless, and how to properly use a Stream in some depth.

Side-effects in behavioral parameters to stream operations are, in general, discouraged, as they can often lead to unwitting violations of the statelessness requirement, as well as other thread-safety hazards.

Behavioral parameters being the arguments given to stateless intermediate operations.

the API cannot make any assumptions

The API can make any assumption it wishes. The onus is on the user of the API to meet those assumptions. However, assumptions may limit usability. The Stream API discourages the creation of a stateless intermediate operation that is not thread-safe. Since it is discouraged instead of prohibited, most Streams will be sequential "by default".



回答3:

Well, answer to self...

After thinking about it a little more seriously (go figure, such things only happen after I actually ask the question), I actually came up with a reason why...

Intermediate operations may NOT be thread safe; as such, the API cannot make any assumptions, hence if the user wants a parallel stream, it has to explicitly ask for it and ensure that all intermediate operations used in the stream are thread safe.

There is however the somewhat misleading case of Collectors; since a Collector cannot know by advance whether it will be called as a terminal operation on a stream which is parallel or not, the contract makes it clear that "just to be safe", any Collector must be thread safe.



回答4:

It is mentioned here: "When you create a stream, it is always a serial stream unless otherwise specified." And here: "It is allowable for this method (parallelStream) to return a sequential stream."

CONCURRENT and IMMUTABLE aren't (directly) related to this. They specify whether the underlying collection can be modified without rendering the spliterator invalid or whether it is immutable respectively. The feature of spliterator that does pretty much define the behavior of parallelStream is trySplit. Terminal operations on a parallel stream will eventually invoke trySplit, and whatever that implementation does will in the end of the day define what parts, if any, of the data are processed in parallel.



回答5:

This appart is not specification constrained right now, however the short answer is NO. There exist parallelStream() and stream() functions but that just provides you ways to access to a parallel or sequential implementations of common basic operations to process the stream. Currently runtime can't assume that your operations are thread safe without explicit usage of parallelStream() or parallel() call, then default implementation of stream() is to have a sequential behavior.