I'm getting a "bus error" from an OpenMP parallel section of code. I recreated a simple version of my problem below. The code essentially makes many calls to the function uniform_distribution
, which draws an integer between 0 and 20000 using Boost's uniform_int_distribution.
This post warns of two threads accessing the same object. I'm guessing that's eng
in my case. (Unfortunately I don't know how to write "an appropriate mutex wrapper", as that post suggests).
A possible dirty solution I thought of was to create a local eng
inside the #pragma for
loop and to pass that as an argument to uniform_distribution
. I don't like this idea because in my real code, I'm calling many functions, and passing a local eng
would be cumbersome. Also, my concern is that different threads will generate the same random number sequence if I declare eng
inside uniform_distribution
. So I have two requirements: How do I parallelize in a way that
- Each thread is generating probabilistically independent draws from other threads?
- No race conditions occur on the RNG?
Thanks; any help is warmly appreciated.
#include <omp.h>
#include <boost/random/uniform_int_distribution.hpp>
boost::random::mt19937 eng;
int uniform_distribution(int rangeLow, int rangeHigh) {
boost::random::uniform_int_distribution<int> unirv(rangeLow, rangeHigh);
return unirv(eng);
}
int main()
{
# pragma omp parallel for private(eng)
for (int bb=0; bb<10000; bb++)
for (int i=0; i<20000; i++)
int a = uniform_distribution(0,20000);
return 0;
}
I think the most convenient solution would involve a thread_local
RNG and a seeding that involves the thread ID as a unique number for each thread, for example, you can do a XOR between the system time and the thread-id to seed the RNG. Something along the lines of (using C++11):
#include <omp.h>
#include <boost/random/uniform_int_distribution.hpp>
#include <thread>
#include <ctime>
boost::random::mt19937& get_rng_engine() {
thread_local boost::random::mt19937 eng(
reinterpret_cast<unsigned int>(std::time(NULL)) ^ std::this_thread::get_id());
return eng;
};
(NOTE: you can also use <random>
if you are going to use C++11)
If you can't use C++11 then you can use boost::thread
instead to have a similar behavior, see the Boost page on thread-local storage too.
When you parallelize some code, you must consider the shared resource, which can cause data races, in turn, eventually may break your program. (Note: not all data races will break your program.)
In your case, as you expected correctly, eng
is the shared by two or more threads, which must be avoided for the correct execution.
A solution for your case is privatization: making a per-thread copy for the shared resources. You need to create a separate copy of eng
.
There are a number of way to do privatization for eng
:
(1) Try to use threadprivate
directive (link): For example, #pragma omp threadprivate(eng)
. However, some compilers may not support non-POD structures for this directive.
(2) In case where threadprivate
is not available, use an array of eng
and access with thread id: declare such as eng[MAX_THREAD]
. Then, access with thread id: eng[omp_get_thread()]
.
However, the second solution needs to consider false sharing, which can severely hurt the performance. It's best to guarantee each item in eng[MAX_THREAD]
is allocated on separate cache line boundary, which is typically 64-byte in modern desktop CPUs. There are also several ways to avoid false sharing. The simplest solution would be using padding: e.g., char padding[x]
in a struct
that holds eng
.
You have two options:
- have individual random number generators for each thread and seed them differently
- use mutual exclusion
First, an example of mutual exclusion:
# pragma omp parallel for
for (int bb=0; bb<10000; bb++)
{
for (int i=0; i<20000; i++)
{
// enter critical region, disallowing simulatneous access to eng
#pragma omp critical
{
int a = uniform_distribution(0,20000);
}
// presumably some more code...
}
// presumably some more code...
}
Next, an example of thread-local storage with seeding:
# pragma omp parallel
{
// declare and seed thread-specific generator
boost::random::mt19937 eng(omp_get_thread_num());
#pragma omp for
for (int bb=0; bb<10000; bb++)
{
for (int i=0; i<20000; i++)
{
int a = uniform_distribution(0,20000, eng);
// presumably some more code...
}
// presumably some more code...
}
}
Both of these snippets are just illustrative, depending on your requirements (say security related vs. a game vs. modelling) you may want to pick one over the other. You will probably also want to change the exact implementation to suit your usage. For instance, how you seed the generator is important if you want it to be either repeatable or closer to truly random (whether that's possible is system specific). This applies to both solutions equally (though to get reproducibility in the mutual exclusion case is harder).
The thread-local generator may run faster while the mutual exclusion case should use less memory.
EDIT: To be clear, the mutual exclusion solutions only makes sense if the generation of the random numbers is not the bulk of the thread's work (that is // presumably some more code...
in the example exists and doesn't take a trivial amount of time to complete). The critical section only needs to encompass the access to the shared variable, changing your architecture a little would allow you finer control over that (and in the thread-local storage case, could also allow you to avoid passing an eng
reference around)