I'm trying to parallelize parts of a project that relies on a lot of recursive algorithms.
Most of them are some form of binary tree creation or traversal and processing.
I'm stuck using GCC v. 4.1.2 on RedHat and the VC++ compiler on Windows (both don't support OpenMP 3.0 with its convenient task
construct).
I found this question which seems to get the job done with nested parallel sections and some throttling to prevent an exorbitant number of threads.
My question: any way to avoid this approach? Some of the functions are called at each timestep, the overhead of repeatedly creating and destroying a thread team is unacceptable.
Here's the basic structure of a recursive function I've been using, in line with the linked question:
extern int threads;
omp_set_nested(1); omp_set_num_threads(2);
void cell::updateRecursive() {
// do stuff for cell for this timestep
#pragma omp flush(threads)
if (threads>=omp_get_num_procs()) {
child0->updateRecursive(); child1->updateRecursive(); // no new threads
} else {
#pragma omp atomic
threads++;
#pragma omp flush(threads)
#pragma omp parallel sections nowait
{
#pragma omp seciton
child0->updateRecursive();
#pragma omp section
child1->updateRecursive();
}
#pragma omp atomic
threads--;
}
}
This would be sufficient if this function weren't called so often. I'd like a way to recurse that can use an existing team of threads to perform work, rather than creating one while traversing the tree.
Would this be possible at all without task
? I've experimented with simply using sections
, but apparently they can't be nested to use an existing team of threads.