Does a good use case exist for skip() on parallel

2019-02-20 07:33发布


EDITED on September, 2015


When I initially asked this question on February, 2015, the behaviour reported in the linked question was counter-intuitive, though kind of allowed by the specification (despite some little inconsistencies in the docs).

However, Tagir Valeev asked a new question on June, 2015, where I think he clearly demonstrated that the behaviour reported in this question was actually a bug. Brain Goetz answered his question, and admitted that it was a bug to not stop the back-propagation of the UNORDERED characteristic of the Stream on skip(), when triggered by a terminal operation that wasn't forced to respect the encounter order of the elements (such as forEach()). Furthermore, in the comments of his own answer, he shared a link to the posted issue in JDK's bug tracking system.

The status of the issue is now RESOLVED, and its fix version is 9, meaning that the fix will be available in JDK9. However, it has also been backported to JDK8 update 60, build 22.

So from JDK8u60-b22 onwards, this question doesn't make sense anymore, since now skip() behave according to intuition, even on parallel streams.


My original question follows...


Recently I had a discussion with some colleagues about this. I say it's quite useless to use skip() on parallel streams, since there doesn't seem to be a good use case for it. They tell me about performance gaining, FJ pool processing, number of cores available to the jvm, etc, however they couldn't give me any practical example of its usage.

Does a good use case exist for skip() on parallel streams?

See this question here on SO. Please read the question and answers, as well as the comments, as there are tons of good arguments there.

1条回答
放我归山
2楼-- · 2019-02-20 07:46

The choice of sequential vs parallel is simply one of execution strategy. The option for parallelism exists so that, if the specifics of the problem (problem size, choice of stream operations, computational work per element, available processors, memory bandwidth, etc) permit, then a performance benefit may be gained by going parallel. Not all combinations of these specifics will admit a performance benefit (and some may even garner a penalty), so we leave it to the user to separately specify the operations from the execution strategy.

For operations like skip() or limit(), which are intrinsically tied to encounter order, it is indeed hard to extract a lot of parallelism, but it is possible; this generally occurs when the computational work per element (often called 'Q') is very high.

Such cases are probably rare (which might be your point); this doesn't make the combination of operation and execution mode "useless", simply of limited usefulness. But one doesn't design a API with multiple dimensions (operations, execution modes) based on the combinations that one can imagine is useful; assuming each combination has a sensible semantics (which it does in this case), it is best to allow all operations in all modes and let the users decide which is useful for them.

查看更多
登录 后发表回答