I need to get an ideal number of threads in a batch program, which runs in batch framework supporting parallel mode, like parallel step in Spring Batch.
As far as I know, it is not good that there are too many threads to execute steps of a program, it may has negative effect to the performance of the program. Some factors could arise performance degradation(context switching, race condition when using shared resources(locking, sync..) ... (are there any other factors?)).
Of course the best way of getting the ideal number of threads is for me to have actual program tests adjusting the number of threads of the program. But in my situation, it is not that easy to have the actual test because many things are needed for the tests(persons, test scheduling, test data, etc..), which are too difficult for me to prepare now. So, before getting the actual tests, I want to know the way of getting a guessable ideal number of threads of my program, as best as I can. What should I consider to get the ideal number of threads(steps) of my program?? number of CPU cores?? number of processes on a machine on which my program would run?? number of database connection?? Is there a rational way such as a formula in a situation like this?
This is tremendously difficult to do without a lot of knowledge over the actual code that you are threading. As @Erwin mentions, IO versus CPU-bound operations are the key bits of knowledge that are needed before you can determine even if threading an application will result is any improvements. Even if you did manage to find the sweet spot for your particular hardware, you might boot on another server (or a different instance of a virtual cloud node) and see radically different performance numbers.
One thing to consider is to change the number of threads at runtime. The
ThreadPoolExecutor.setCorePoolSize(...)
is designed to be called after the thread-pool is in operation. You could expose some JMX hooks to do this for you manually.You could also allow your application to monitor the application or system CPU usage at runtime and tweak the values based on that feedback. You could also keep
AtomicLong
throughput counters and dial the threads up and down at runtime trying to maximize the throughput. Getting that right might be tricky however.I typically try to:
The most important consideration is whether your application/calculation is CPU-bound or IO-bound.
General Equation:
Number of Threads <= (Number of cores) / (1 - blocking factor)
Where 0 <= blocking factor < 1
Number of Core of a machine :
Runtime.getRuntime().availableProcessors()
Number of Thread you can parallelism, you will get by printing out this code :
And the number parallelism is Number of Core of your machine - 1. Because that one is for main thread.
Source link
Time : 1:09:00