可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

void NetClass::Modulate(vector <synapse> & synapses )
{
    int size = synapses.size();
    int split = 200 * 0.5;

    for(int w=0; w < size; w++)
        if(synapses[w].active)
            synapses[w].rmod = ((rand_r(seedp) % 200 - split ) / 1000.0);
}

The function rand_r(seedp) is seriously bottle-necking my program. Specifically, its slowing me by 3X when run serialy, and 4.4X when run on 16 cores. rand() is not an option because its even worse. Is there anything I can do to streamline this? If it will make a difference, I think I can sustain a loss in terms of statistical randomness. Would pre-generating (before execution) a list of random numbers and then loading to the thread stacks be an option?

回答1:

Problem is that seedp variable (and its memory location) is shared among several threads. Processor cores must synchronize their caches each time they access this ever changing value, which hampers performance. The solution is that all threads work with their own seedp, and so avoid cache synchronization.

回答2:

Marsaglia's xor-shift generator is the probably fastest "reasonable quality" generator that you can use. It does not quite have the same "quality" as MT19937 or WELL, but honestly these differences are academic sophistries.
For all real, practical uses, there is no observable difference, except 1-2 orders of magnitude difference in execution speed, and 3 orders of magnitude of difference in memory consumption.

The xor-shift generator is also naturally thread-safe (in the sense that it will produce non-deterministic, pseudorandom results, and it will not crash) without anything special, and it can be trivially made thread-safe in another sense (in the sense that it will generate per-thread independent, deterministic, pseudorandom numbers) by having one instance per thread.
It could also be made threadsafe in yet another sense (generate a deterministic, pseudorandom sequence handed out to threads as they come) using atomic compare-exchange, but I don't think that's very useful.

The only three notable issues with the xor-shift generator are:

It is not k-distributed for up to 623 dimensions, but honestly who cares. I can't think in more than 4 dimensions (and even that's a lie!), and can't imagine many applications where more than 10 or 20 dimensions could possibly matter. That would have to be some quite esoteric simulation.
It passes most, but not ever pedantic statistic test. Again, who cares. Most people use a random generator that does not even pass a single test and never notice.
A zero seed will produce a zero sequence. This is trivially fixed by adding a non-zero constant to one of the temporaries (I wonder why Marsaglia never thought of that?). Having said that, MT19937 also behaves extremely badly given a zero seed, and does not recover nearly as well.

回答3:

It depends on how good the statistical randomness needs to be. For high quality, the Mersenne twister, or its SIMD variant, is a good choice. You can generate and buffer a large block of pseudo-random numbers at a time, and each thread can have its own state vector. The Park-Miller-Carta PRNG is extremely simple - these guys even implemented it as a CUDA kernel.

回答4:

Have a look at Boost: http://www.boost.org/doc/libs/1_47_0/doc/html/boost_random.html It has a number of options which vary in complexity (= speed) and randomness (cycle length).

If you don't need maximum randomness, you might get away with a simple Mersenne Twister.

回答5:

do you absolutely need to have 1 shared random?

I had a similar contention problem a while ago, the solution that worked best for me was to create a new Random class (I was working in C#) for each thread. they're dead cheap anyway.

If you seed them properly to make sure you don't create duplicate seeds you should be fine. Then you won't have shared state so you don't need to use the threadsafe function.

Regards GJ

回答6:

maybe you don't have to call it in every iteration? you could initialize an array of pre-randomized elements and make successive use of it...

回答7:

I think you can use OpenMP for paralleling like this:

#pragma omp parallel
for(int w=0; w < size; w++)

C++ super fast thread-safe rand function

问题:

回答1:

回答2:

回答3:

回答4:

回答5:

回答6:

回答7:

收藏的人(0)

C++ super fast thread-safe rand function

问题:

回答1:

回答2:

回答3:

回答4:

回答5:

回答6:

回答7:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮