Random numbers, C++11 vs Boost

I want to generate pseudo-random numbers in C++, and the two likely options are the feature of C++11 and the Boost counterpart. They are used in essentially the same way, but the native one in my tests is roughly 4 times slower.

Is that due to design choices in the library, or am I missing some way of disabling debug code somewhere?

Update: Code is here, https://github.com/vbeffara/Simulations/blob/master/tests/test_prng.cpp and looks like this:

cerr << "boost::bernoulli_distribution ...      \ttime = ";
s=0; t=time();
boost::bernoulli_distribution<> dist(.5);
boost::mt19937 boostengine;
for (int i=0; i<n; ++i) s += dist(boostengine);
cerr << time()-t << ",  \tsum = " << s << endl;

cerr << "C++11 style ...                        \ttime = ";
s=0; t=time();
std::bernoulli_distribution dist2(.5);
std::mt19937_64 engine;
for (int i=0; i<n; ++i) s += dist2(engine);
cerr << time()-t << ",  \tsum = " << s << endl;

(Using std::mt19937 instead of std::mt19937_64 makes it even slower on my system.)

That’s pretty scary.

Let’s have a look:

boost::bernoulli_distribution<>

if(_p == RealType(0))
    return false;
else
    return RealType(eng()-(eng.min)()) <= _p * RealType((eng.max)()-(eng.min)());

std::bernoulli_distribution

__detail::_Adaptor<_UniformRandomNumberGenerator, double> __aurng(__urng);
if ((__aurng() - __aurng.min()) < __p.p() * (__aurng.max() - __aurng.min()))
    return true;
return false;

Both versions invoke the engine and check if the output lies in a portion of the range of values proportional to the given probability.

The big difference is, that the gcc version calls the functions of a helper class _Adaptor.

This class’ min and max functions return 0 and 1 respectively and operator() then calls std::generate_canonical with the given URNG to obtain a value between 0 and 1.

std::generate_canonical is a 20 line function with a loop – which will never iteratate more than once in this case, but it adds complexity.

Apart from that, boost uses the param_type only in the constructor of the distribution, but then saves _p as a double member, whereas gcc has a param_type member and has to “get” the value of it.

This all comes together and the compiler fails in optimizing. Clang chokes even more on it.

If you hammer hard enough you can even get std::mt19937 and boost::mt19937 en par for gcc.

It would be nice to test libc++ too, maybe i’ll add that later.

tested versions: boost 1.55.0, libstdc++ headers of gcc 4.8.2
line numbers on request^^