void NetClass::Modulate(vector <synapse> & synapses )
{
int size = synapses.size();
int split = 200 * 0.5;
for(int w=0; w < size; w++)
if(synapses[w].active)
synapses[w].rmod = ((rand_r(seedp) % 200 - split ) / 1000.0);
}
The function rand_r(seedp)
is seriously bottle-necking my program. Specifically, its slowing me by 3X when run serialy, and 4.4X when run on 16 cores. rand()
is not an option because its even worse. Is there anything I can do to streamline this? If it will make a difference, I think I can sustain a loss in terms of statistical randomness. Would pre-generating (before execution) a list of random numbers and then loading to the thread stacks be an option?
It depends on how good the statistical randomness needs to be. For high quality, the Mersenne twister, or its SIMD variant, is a good choice. You can generate and buffer a large block of pseudo-random numbers at a time, and each thread can have its own state vector. The Park-Miller-Carta PRNG is extremely simple - these guys even implemented it as a CUDA kernel.
do you absolutely need to have 1 shared random?
I had a similar contention problem a while ago, the solution that worked best for me was to create a new Random class (I was working in C#) for each thread. they're dead cheap anyway.
If you seed them properly to make sure you don't create duplicate seeds you should be fine. Then you won't have shared state so you don't need to use the threadsafe function.
Regards GJ
Have a look at
Boost
: http://www.boost.org/doc/libs/1_47_0/doc/html/boost_random.html It has a number of options which vary in complexity (= speed) and randomness (cycle length).If you don't need maximum randomness, you might get away with a simple Mersenne Twister.
maybe you don't have to call it in every iteration? you could initialize an array of pre-randomized elements and make successive use of it...
I think you can use OpenMP for paralleling like this:
Problem is that
seedp
variable (and its memory location) is shared among several threads. Processor cores must synchronize their caches each time they access this ever changing value, which hampers performance. The solution is that all threads work with their ownseedp
, and so avoid cache synchronization.