Peek() really to see the elements as they flow pas

2020-07-22 19:32发布

问题:

My problem in most simple expressible way:

According to JavaDoc :

Peek() method exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline.

I have a pipe of 10 Meters and at the distance of 3 and 7 meter from input head i have two markers [aka peek()] for checking/debugging my elements.

Now from Input end i am giving input of 1,2,3,4,5.

At the point x = 4 meter , i have a filter() which filters all elements less than and equal to 3.

Now as per Java doc i should be able to see what has happened to my input in pipeline at distance 3 and 7 meters.

Output at marker1 at distance 3 (.peek()) should be 1,2,3,4,5 shouldn't be?? and output at marker2 at distance 7 should be 4,5 obviously.

But this is not happening in actual, the output is coming at 1st market(.peek()) just 1,2,3 and at 2nd it is coming 4,5.


The code that i executed to test my theory:

final List<Integer> IntList=
    Stream.of(1, 2, 3, 4, 5)
    .peek(it -> System.out.println("Before Filtering "+it)) // should print 1,2,3,4,5
    .filter(it -> it >= 3)
    .peek(it -> System.out.println("After Filtering: "+it)) //should print 4,5
    .collect(Collectors.toList());

Actual Output:

Before Filtering 1
Before Filtering 2
Before Filtering 3
After Filtering: 3
Before Filtering 4
After Filtering: 4
Before Filtering 5
After Filtering: 5

Expected Output (what a dev should think after reading JavaDoc (...exists mainly to support debugging, where you want to see the elements as they flow past a certain point in a pipeline...)

    Before Filtering 1
    Before Filtering 2
    Before Filtering 3
    Before Filtering 4
    Before Filtering 5
    After Filtering: 4
    After Filtering: 5

If .peek() is not just for debugging at a particular point in pipeline then it the def is ambiguous.

Sorry for my story of Pipe , i thought this way i could explain my best what i want to ask.

回答1:

No. Streams may be evaluated lazily as needed, and the order of operations is not strongly defined, especially when you're peek()ing. This allows the streams API to support very large streams without significant waste of time and memory, as well as allowing certain implementation simplifications. In particular, a single stage of the pipeline need not be fully evaluated before the next stage is.

Suppose how wasteful the following code would be, given your assumptions:

IntStream.range(1, 1000000).skip(5).limit(10).forEach(System::println);

The stream starts with one million elements and ends up with 10. If we evaluated each stage fully, our intermediate would be 1 million, 999995, and 10 elements, respectively.

As a second example, the following stream cannot be evaluated a stage at a time (because IntStream.generate returns an infinite stream):

IntStream.generate(/* some supplier */).limit(10).collect(Collectors.toList());

Your pipeline does indeed pass every single element through the first peek, and then only a subset through the second peek. However, the pipeline performs this evaluation in an element-major rather than stage-major order: it evaluates the pipe for 1, dropping it at the filter, then 2. Once it evaluates the pipe for 3, it passes the filter thus both peek statement execute, and the same then occurs for 4 and 5.



回答2:

The answer from Andrey Akhmetov is correct, but I want to add to it, because there are two issues here. One is the general issue of the semantics of stream pipelines -- which is really what your question is about. The secondary one is about the meaning, and limitations, of peek().

To the main question -- which has nothing to do with peek(), except that's how you are observing the state of what's going on -- your intuition about streams is simply incorrect. There is no reason to believe that in:

collection.stream()
          .filter(x -> x.foo() > 3)
          .map(X::toBar)
          .forEach(b -> System.out.println("Bar: " + b);

that all the filtering happens before all the mapping before all the printing. The stream is free to interleave filtering and mapping and printing in any order it likes. (There are some ordering guarantees in the aggregate.) The benefit here is that this is often more performant, more parallelizable, and more robust in some situations with infinite streams. As long as you follow the rules (i.e., don't rely on the side-effects of one stage in another stage), you won't be able to tell the difference, except maybe that your code runs faster.

The reason for the wiggly language of peek() is that for pipelines like:

int size = collection.stream()
                     .map(...)
                     .peek(...)
                     .count()

We can evaluate the answer without doing any mapping (since map() is known to be a size-preserving operation.) The requirement to always provide the elements at peek() points would have undermined a number of useful optimizations. So the implementation is free to elide the entire middle of the pipeline if it can prove it won't affect the answer. (It may produce fewer side-effects, but if you care about side-effects so much, maybe you shouldn't be using streams.)