While reading the documentation about streams, I came across the following sentences:
... attempting to access mutable state from behavioral parameters presents you with a bad choice ... if you do not synchronize access to that state, you have a data race and therefore your code is broken ... [1]
If the behavioral parameters do have side-effects ... [there are no] guarantees that different operations on the "same" element within the same stream pipeline are executed in the same thread. [2]
For any given element, the action may be performed at whatever time and in whatever thread the library chooses. [3]
These sentences don't make a distinction between sequential and parallel streams. So my questions are:
- In which thread is the pipeline of a sequential stream executed? Is it always the calling thread or is an implementation free to choose any thread?
- In which thread is the action parameter of the forEach terminal operation executed if the stream is sequential?
- Do I have to use any synchronization when using sequential streams?
- [1+2] https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html
- [3] https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#forEach-java.util.function.Consumer-
- Stream's terminal operations are blocking operations. In case there is no parallel excution, the thread that executes the terminal operation runs all the operations in the pipeline.
Definition 1.1. Pipeline is a couple of chained methods.
Definition 1.2. Intermediate operations will be located everywhere in the stream except at the end. They return a stream object and does not execute any operation in the pipeline.
Definition 1.3. Terminal operations will be located only at the end of the stream. They execute the pipeline. They does not return stream object so no other Intermidiate operations or terminal operations can be added after them.
- From the first solution we can conclude that the calling thread will execute the
action
method inside the forEach
terminal operation on each element in the calling stream.
Java 8 introduces us the Spliterator
interface. It has the capabilities of Iterator
but also a set of operations to help performing and spliting a task in parallel.
When calling forEach
from primitive streams in sequential execution, the calling thread will invoke the Spliterator.forEachRemaining
method:
@Override
public void forEach(IntConsumer action) {
if (!isParallel()) {
adapt(sourceStageSpliterator()).forEachRemaining(action);
}
else {
super.forEach(action);
}
}
You can read more on Spliterator
in my tutorial: Chapter 7: Spliterator
- As long as you don't mutate any shared state between multiple threads in one of the stream operations(and it is forbidden - explained soon), you do not need to use any additional synchronization tool or algorithm when you want to run parallel streams.
Stream operations like reduce use accumulator
and combiner
functions for executing parallel streams. The streams library by definition forbids mutation. You should avoid it.
There are a lot of definitions in concurrent and parallel programming. I will introduce a set of definitions that will serve us best.
Definition 8.1. Cuncurrent programming is the ability to solve a task using additional synchronization algorithms.
Definition 8.2. Parallel programming is the ability to solve a task without using additional synchronization algorithms.
You can read more about it in my tutorial: Chapter 8: Parallel Streams.