可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a pseudo code that requires parallelization:

int thread_count=8;
for(int i=1;i<100000;i++)
{
do_work(i,record);
}

do_work function will work based on i and write outputs to record. Now, I would like to convert this serialized implementation into a multithreading implementation;

I know that I could do something like

int thread_count=8;
for(int i=1;i<100000;i++)
{
boost::thread t1(do_work,i,std::ref(record));
}

But this will create thousands of threads which will harm performance. I believe the problem posted should be the most natural form that requires multi_threading and I would like to know what is the standard c++ practice to solve this problem... Thank you.

回答1:

Creating multiple threads helps performance only up to the number of cores you have since each thread can run in its own core and not affect the other. However you said they do_work() function writes to a record. If that variable is shared between the threads using a mutex in this case will horribly degrade performance even if you are running threads on their own cores.

A spinlock can help in reducing the time overhead of a mutex in this case but it is still based on std::atomic_flag (at least boost::spinlock is when compiled on GNU g++) which is an atomic variable and will thus require the overhead of syncing caches. You should only look to parallelize this up to the extent where each thread can run in an independent core.

Unless your program is like a server where you need to serve requests without blocking others. In which case a pool of threads (maybe which grows and shrinks dynamically) should be the right option. Servers like Apache also use thread pools in many cases

回答2:

A good approach would be to use a parallel algorithm, such as TBB parallel_for_each. Under the hood it is going to create a (global) thread pool for you and schedule chunks-of-work/tasks across all available CPUs without oversubscribing, e.g. it won't create more worker threads than available CPUs.

回答3:

Can you use OpenMP? Just add #pragma omp parallel for before loop and enable openmp support for the compiler

回答4:

This is a perfect candidate for threadpool. If you don't need to wait on the results, simple threadpool would suffice. If you need to wait on the results and do other stuff based on the results you need a future based pool. I wrote both versions of it in C++ for my work and I would be happy to share if you would like. Threadpools are great ways to optimally use multicore processor resources.