I have a pseudo code that requires parallelization:
int thread_count=8;
for(int i=1;i<100000;i++)
{
do_work(i,record);
}
do_work function will work based on i and write outputs to record. Now, I would like to convert this serialized implementation into a multithreading implementation;
I know that I could do something like
int thread_count=8;
for(int i=1;i<100000;i++)
{
boost::thread t1(do_work,i,std::ref(record));
}
But this will create thousands of threads which will harm performance. I believe the problem posted should be the most natural form that requires multi_threading and I would like to know what is the standard c++ practice to solve this problem... Thank you.
Creating multiple threads helps performance only up to the number of cores you have since each thread can run in its own core and not affect the other. However you said they do_work()
function writes to a record. If that variable is shared between the threads using a mutex
in this case will horribly degrade performance even if you are running threads on their own cores.
A spinlock can help in reducing the time overhead of a mutex in this case but it is still based on std::atomic_flag
(at least boost::spinlock
is when compiled on GNU g++) which is an atomic variable and will thus require the overhead of syncing caches. You should only look to parallelize this up to the extent where each thread can run in an independent core.
Unless your program is like a server where you need to serve requests without blocking others. In which case a pool of threads (maybe which grows and shrinks dynamically) should be the right option. Servers like Apache also use thread pools in many cases
A good approach would be to use a parallel algorithm, such as TBB parallel_for_each
. Under the hood it is going to create a (global) thread pool for you and schedule chunks-of-work/tasks across all available CPUs without oversubscribing, e.g. it won't create more worker threads than available CPUs.
Can you use OpenMP? Just add #pragma omp parallel for
before loop and enable openmp support for the compiler
This is a perfect candidate for threadpool. If you don't need to wait on the results, simple threadpool would suffice. If you need to wait on the results and do other stuff based on the results you need a future based pool. I wrote both versions of it in C++ for my work and I would be happy to share if you would like. Threadpools are great ways to optimally use multicore processor resources.