How do Java 8 parallel streams behave on a thrown

2020-02-26 05:25发布

问题:

How do Java 8 parallel streams behave on a thrown exception in the consuming clause, for example in forEach handling? For example, the following code:

final AtomicBoolean throwException = new AtomicBoolean(true);
IntStream.range(0, 1000)
    .parallel()
    .forEach(i -> {
        // Throw only on one of the threads.
        if (throwException.compareAndSet(true, false)) {
            throw new RuntimeException("One of the tasks threw an exception. Index: " + i);
        });

Does it stop the handled elements immediately? Does it wait for the already started elements to finish? Does it wait for all the stream to finish? Does it start handling stream elements after the exception is thrown?

When does it return? Immediately after the exception? After all/part of the elements were handled by the consumer?

Do elements continue being handled after the parallel stream threw the exception? (Found a case where this happened).

Is there a general rule here?

EDIT (15-11-2016)

Trying to determine if the parallel stream returns early, I found that it's not determinate:

@Test
public void testParallelStreamWithException() {
    AtomicInteger overallCount = new AtomicInteger(0);
    AtomicInteger afterExceptionCount = new AtomicInteger(0);
    AtomicBoolean throwException = new AtomicBoolean(true);

    try {
        IntStream.range(0, 1000)
            .parallel()
            .forEach(i -> {
                overallCount.incrementAndGet();
                afterExceptionCount.incrementAndGet();
                try {
                    System.out.println(i + " Sleeping...");
                    Thread.sleep(1000);
                    System.out.println(i + " After Sleeping.");
                }
                catch (InterruptedException e) {
                    e.printStackTrace();
                }
                // Throw only on one of the threads and not on main thread.
                if (!Thread.currentThread().getName().equals("main") && throwException.compareAndSet(true, false)) {
                    System.out.println("Throwing exception - " + i);
                    throw new RuntimeException("One of the tasks threw an exception. Index: " + i);
                }
            });
        Assert.fail("Should not get here.");
    }
    catch (Exception e) {
        System.out.println("Cought Exception. Resetting the afterExceptionCount to zero - 0.");
        afterExceptionCount.set(0);
    }
    System.out.println("Overall count: " + overallCount.get());
    System.out.println("After exception count: " + afterExceptionCount.get());
}

Late return when throwing not from the main thread. This caused a lot of new elements to be handled way after the exception was thrown. On my machine, about 200 elements were handled after the exception was thrown. BUT, not all 1000 elements were handled. So what's the rule here? Why more elements were handled even though the exception was thrown?

Early return when removing the not (!) sign, causing the exception to be thrown in the main thread. Only the already started elements finished processing and no new ones were handled. Returning early was the case here. Not consistent with the previous behavior.

What am I missing here?

回答1:

When an exception is thrown in one of the stages, it does not wait for other operations to finish, the exception is re-thrown to the caller. That is how ForkJoinPool handles that.

In contrast findFirst for example when run in parallel, will present the result to the caller only after ALL operations have finished processing (even if the result is known before the need to finish of all operations).

Put in other words : it will return early, but will leave all the running tasks to finish.

EDIT to answer the last comment

This is very much explained by Holger's answer (link in comments), but here are some details.

1) When killing all BUT the main thread, you are also killing all the tasks that were supposed to be handled by these threads. So that number should actually be more around 250 as there are 1000 tasks and 4 Threads, I assume this returns 3?:

int result = ForkJoinPool.getCommonPoolParallelism();

Theoretically there are 1000 tasks, there are 4 threads, each supposed to handle 250 tasks, then you kill 3 of them meaning 750 tasks are lost. There are 250 tasks left to execute, and ForkJoinPool will span 3 new threads to execute these 250 left tasks.

A few things you can try, change your stream like this (making the stream not sized):

IntStream.generate(random::nextInt).limit(1000).parallel().forEach

This time, there would be many more operations ending, because the initial split index is unknown and chosen by some other strategy. What you could also try is change this :

 if (!Thread.currentThread().getName().equals("main") && throwException.compareAndSet(true, false)) {

to this:

 if (!Thread.currentThread().getName().equals("main")) {

This time you would always kill all threads besides main, until a certain point, where no new threads will be created by ForkJoinPool as the task is too small to split, thus no need for other threads. In this case even less tasks would finish.

2) Your second example, when you actually kill the main thread, as the way code is, you will not see the actual running of other threads. Change it :

    } catch (Exception e) {
        System.out.println("Cought Exception. Resetting the afterExceptionCount to zero - 0.");
        afterExceptionCount.set(0);
    }

    // give some time for other threads to finish their work. You could play commenting and de-commenting this line to see a big difference in results. 
    TimeUnit.SECONDS.sleep(60);

    System.out.println("Overall count: " + overallCount.get());
    System.out.println("After exception count: " + afterExceptionCount.get());