Non-interference exact meaning in Java 8 streams

2019-01-18 12:45发布

问题:

Does the non-interference requirement for using streams of non-concurrent data structure sources mean that we can't change the state of an element of the data structure during the execution of a stream pipeline (in addition to that we can't change the source data structure itself)? (Question 1)

In the section about non-interference, in the stream package description, its said: "For most data sources, preventing interference means ensuring that the data source is not modified at all during the execution of the stream pipeline."

This passage does not mention modifying the state of elements?

For example, assuming "shapes" is non-thread-safe collection (such as ArrayList), is the code below considered to have an interference? (Question 2)

shapes.stream() 
      .filter(s -> s.getColor() == BLUE)
      .forEach(s -> s.setColor(RED));

This example is taken from a reliable source (to say the least), so it should be correct. But what if I changed stream() to be parallelStream(), will it still be safe and correct? (Question 3)

On the other hand, "Mastering Lambdas" by Naftalin Maurice, another reliable source, makes it clear that changing the state (value) of elements by the pipeline operation is indeed interference. From the section about non-interference (3.2.3):

"But the rules for streams forbid any modification of stream sources—including, for example, changing the value of an element— by any thread, not only pipeline operations."

If what's said in the book is correct, does it mean we can't use the Stream API to modify state of elements (using forEach), and have to do that using the regular iterator (or for-each, or Iterable.forEach)? (Question 4)

回答1:

There's a bigger class of functions called "functions with side effects". The JavaDoc statement is correct and complete: here interference means modifying the mutable source. Another case is stateful expressions: expressions which depend on the application state or change this state. You may read the Parallelism tutorial on Oracle site.

In general you can modify the stream elements themselves and it should not be called as "interference". Beware though if you have the same mutable object produced several times by the stream source (for example, using Collections.nCopies(10, new MyMutableObject()).parallelStream(). While it's ensured that the same stream element is not processed concurrently by several threads, if your stream produces the same element twice, you may surely have a race condition when modifying it in the forEach, for example.

So while stateful expressions are sometimes smell and should be used with care and avoided if there's a stateless alternative, they are probably ok if they don't interfere with the stream source. When the stateless expression is required (for example, in Stream.map method), it's specially mentioned in the API docs. In forEach documentation only non-interference is required.

So back to your questions:

Question 1: no we can change the element state, and it's not called interference (though called statefullness)

Question 2: no it has no interference unless you have repeating objects in your stream source)

Question 3: you can safely use parallelStream() there

Question 4: no, you can use Stream API in this case.



回答2:

Modifying the state of an object stored in a data structure is different from reassigning an element of a data structure.

When the other writes "changing the value of an element" presumably they mean as if assigning a new object to an index of an existing List.

From your link:

It is best to avoid any side-effects in the lambdas passed to stream methods. While some side-effects, such as debugging statements that print out values are usually safe, accessing mutable state from these lambdas can cause data races or surprising behavior since lambdas may be executed from many threads simultaneously, and may not see elements in their natural encounter order. Non-interference includes not only not interfering with the source, but not interfering with other lambdas; this sort of interference can arise when one lambda modifies mutable state and another lambda reads it.

As long as the non-interference requirement is satisfied, we can execute parallel operations safely and with predictable results even on non-thread-safe sources such as ArrayList.

This pertains specifically to parallelism and is no different than any other concurrent programming. Modifying state can cause issues with visibility amongst threads.