I'm getting a "bus error" from an OpenMP parallel section of code. I recreated a simple version of my problem below. The code essentially makes many calls to the function uniform_distribution
, which draws an integer between 0 and 20000 using Boost's uniform_int_distribution.
This post warns of two threads accessing the same object. I'm guessing that's eng
in my case. (Unfortunately I don't know how to write "an appropriate mutex wrapper", as that post suggests).
A possible dirty solution I thought of was to create a local eng
inside the #pragma for
loop and to pass that as an argument to uniform_distribution
. I don't like this idea because in my real code, I'm calling many functions, and passing a local eng
would be cumbersome. Also, my concern is that different threads will generate the same random number sequence if I declare eng
inside uniform_distribution
. So I have two requirements: How do I parallelize in a way that
- Each thread is generating probabilistically independent draws from other threads?
- No race conditions occur on the RNG?
Thanks; any help is warmly appreciated.
#include <omp.h>
#include <boost/random/uniform_int_distribution.hpp>
boost::random::mt19937 eng;
int uniform_distribution(int rangeLow, int rangeHigh) {
boost::random::uniform_int_distribution<int> unirv(rangeLow, rangeHigh);
return unirv(eng);
}
int main()
{
# pragma omp parallel for private(eng)
for (int bb=0; bb<10000; bb++)
for (int i=0; i<20000; i++)
int a = uniform_distribution(0,20000);
return 0;
}
When you parallelize some code, you must consider the shared resource, which can cause data races, in turn, eventually may break your program. (Note: not all data races will break your program.)
In your case, as you expected correctly,
eng
is the shared by two or more threads, which must be avoided for the correct execution.A solution for your case is privatization: making a per-thread copy for the shared resources. You need to create a separate copy of
eng
.There are a number of way to do privatization for
eng
:(1) Try to use
threadprivate
directive (link): For example,#pragma omp threadprivate(eng)
. However, some compilers may not support non-POD structures for this directive.(2) In case where
threadprivate
is not available, use an array ofeng
and access with thread id: declare such aseng[MAX_THREAD]
. Then, access with thread id:eng[omp_get_thread()]
.However, the second solution needs to consider false sharing, which can severely hurt the performance. It's best to guarantee each item in
eng[MAX_THREAD]
is allocated on separate cache line boundary, which is typically 64-byte in modern desktop CPUs. There are also several ways to avoid false sharing. The simplest solution would be using padding: e.g.,char padding[x]
in astruct
that holdseng
.You have two options:
First, an example of mutual exclusion:
Next, an example of thread-local storage with seeding:
Both of these snippets are just illustrative, depending on your requirements (say security related vs. a game vs. modelling) you may want to pick one over the other. You will probably also want to change the exact implementation to suit your usage. For instance, how you seed the generator is important if you want it to be either repeatable or closer to truly random (whether that's possible is system specific). This applies to both solutions equally (though to get reproducibility in the mutual exclusion case is harder).
The thread-local generator may run faster while the mutual exclusion case should use less memory.
EDIT: To be clear, the mutual exclusion solutions only makes sense if the generation of the random numbers is not the bulk of the thread's work (that is
// presumably some more code...
in the example exists and doesn't take a trivial amount of time to complete). The critical section only needs to encompass the access to the shared variable, changing your architecture a little would allow you finer control over that (and in the thread-local storage case, could also allow you to avoid passing aneng
reference around)I think the most convenient solution would involve a
thread_local
RNG and a seeding that involves the thread ID as a unique number for each thread, for example, you can do a XOR between the system time and the thread-id to seed the RNG. Something along the lines of (using C++11):(NOTE: you can also use
<random>
if you are going to use C++11)If you can't use C++11 then you can use
boost::thread
instead to have a similar behavior, see the Boost page on thread-local storage too.