Is it possible to specify a custom thread pool for Java 8 parallel stream? I can not find it anywhere.
Imagine that I have a server application and I would like to use parallel streams. But the application is large and multi-threaded so I want to compartmentalize it. I do not want a slow running task in one module of the applicationblock tasks from another module.
If I can not use different thread pools for different modules, it means I can not safely use parallel streams in most of real world situations.
Try the following example. There are some CPU intensive tasks executed in separate threads. The tasks leverage parallel streams. The first task is broken, so each step takes 1 second (simulated by thread sleep). The issue is that other threads get stuck and wait for the broken task to finish. This is contrived example, but imagine a servlet app and someone submitting a long running task to the shared fork join pool.
public class ParallelTest {
public static void main(String[] args) throws InterruptedException {
ExecutorService es = Executors.newCachedThreadPool();
es.execute(() -> runTask(1000)); //incorrect task
es.execute(() -> runTask(0));
es.execute(() -> runTask(0));
es.execute(() -> runTask(0));
es.execute(() -> runTask(0));
es.execute(() -> runTask(0));
es.shutdown();
es.awaitTermination(60, TimeUnit.SECONDS);
}
private static void runTask(int delay) {
range(1, 1_000_000).parallel().filter(ParallelTest::isPrime).peek(i -> Utils.sleep(delay)).max()
.ifPresent(max -> System.out.println(Thread.currentThread() + " " + max));
}
public static boolean isPrime(long n) {
return n > 1 && rangeClosed(2, (long) sqrt(n)).noneMatch(divisor -> n % divisor == 0);
}
}
I tried the custom ForkJoinPool as follows to adjust the pool size:
Here is the output saying the pool is using more threads than the default 4.
But actually there is a weirdo, when I tried to achieve the same result using
ThreadPoolExecutor
as follows:but I failed.
It will only start the parallelStream in a new thread and then everything else is just the same, which again proves that the
parallelStream
will use the ForkJoinPool to start its child threads.There actually is a trick how to execute a parallel operation in a specific fork-join pool. If you execute it as a task in a fork-join pool, it stays there and does not use the common one.
The trick is based on ForkJoinTask.fork which specifies: "Arranges to asynchronously execute this task in the pool the current task is running in, if applicable, or using the ForkJoinPool.commonPool() if not inForkJoinPool()"
If you don't mind using a third-party library, with cyclops-react you can mix sequential and parallel Streams within the same pipeline and provide custom ForkJoinPools. For example
Or if we wished to continue processing within a sequential Stream
[Disclosure I am the lead developer of cyclops-react]
The parallel streams use the default
ForkJoinPool.commonPool
which by default has one less threads as you have processors, as returned byRuntime.getRuntime().availableProcessors()
(This means that parallel streams use all your processors because they also use the main thread):This also means if you have nested parallel streams or multiple parallel streams started concurrently, they will all share the same pool. Advantage: you will never use more than the default (number of available processors). Disadvantage: you may not get "all the processors" assigned to each parallel stream you initiate (if you happen to have more than one). (Apparently you can use a ManagedBlocker to circumvent that.)
To change the way parallel streams are executed, you can either
yourFJP.submit(() -> stream.parallel().forEach(soSomething)).get();
orSystem.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "20")
for a target parallelism of 20 threads.Example of the latter on my machine which has 8 processors. If I run the following program:
The output is:
So you can see that the parallel stream processes 8 items at a time, i.e. it uses 8 threads. However, if I uncomment the commented line, the output is:
This time, the parallel stream has used 20 threads and all 20 elements in the stream have been processed concurrently.
To measure the actual number of used threads, you can check
Thread.activeCount()
:This can produce on a 4-core CPU an output like:
Without
.parallel()
it gives:If you don't need a custom ThreadPool but you rather want to limit the number of concurrent tasks, you can use:
(Duplicate question asking for this is locked, so please bear me here)