Using the random class, and a seed of time(NULL) the uniform distribution always gives the same first output, even with different compilings, but after the first output behaves as you would expect a pseudorandom number generator to behave.
Is this by construction, or am I using it incorrectly?
MWE:
#include <ctime>
#include <iostream>
#include <random>
using namespace std;
default_random_engine gen(time(NULL));
uniform_int_distribution<int> dist(10,200);
int main()
{
for(int i = 0; i < 5; i++)
cout<<dist(gen)<<endl;
return 0;
}
The first three times I ran this program I get outputs of:
57
134
125
136
112
Before the second try I decided to delete empty line between uniform_int_distribution
and int main()
just to see if the seed was based on compile time, as you can see, that didn't matter.
57
84
163
42
146
Just running again:
57
73
181
160
46
So on my runs I keep getting 57
first, which of course isn't the end of the world, if I want different outputs I can throw away the first output. But it brings into question whether this is by design (if so why?) or if I am misusing the generator somehow (if so how?).
I'm not sure what's going wrong (yet!), but you can still initialize by time as follows without hitting the problem (borrowed from here).
#include <ctime>
#include <iostream>
#include <random>
#include <chrono>
using namespace std;
unsigned seed1 = std::chrono::system_clock::now().time_since_epoch().count();
default_random_engine gen(seed1); //gen(time(NULL));
uniform_int_distribution<int> dist(10,200);
int main()
{
for(int i = 0; i < 5; i++)
cout<<dist(gen)<<endl;
return 0;
}
You can also use the random device, which is non-determinstic (it steals timing information from your key strokes, mouse movements, and other sources to generate unpredictable numbers). This is the strongest seed you can choose but the computer's clock is the better way to go if you don't need strong guarantees because the computer can run out of "randomness" if you use it to often (it takes many key strokes and mouse movements to generate a single truly random number).
std::random_device rd;
default_random_engine gen(rd());
Running
cout<<time(NULL)<<endl;
cout<<std::chrono::system_clock::now().time_since_epoch().count()<<endl;
cout<<rd()<<endl;
on my machine generates
1413844318
1413844318131372773
3523898368
so the chrono
library is providing a significantly larger number and more rapidly changing number (that's in nanoseconds) than the ctime
library. The random_device
is producing non-deterministic numbers which are all over the map. So it seems as though the seeds ctime
is producing may be too close together somehow and thus map partly to the same internal state?
I made another program which looks like this:
#include <iostream>
#include <random>
using namespace std;
int main(){
int oldval = -1;
unsigned int oldseed = -1;
cout<<"Seed\tValue\tSeed Difference"<<endl;
for(unsigned int seed=0;seed<time(NULL);seed++){
default_random_engine gen(seed);
uniform_int_distribution<int> dist(10,200);
int val = dist(gen);
if(val!=oldval){
cout<<seed<<"\t"<<val<<"\t"<<(seed-oldseed)<<endl;
oldval = val;
oldseed = seed;
}
}
}
As you can see, this simply prints out the first output value for every possible random seed up to the current time along with the seed and number of previous seeds which had the same value. An excerpt of the output looks like this:
Seed Value Seed Difference
0 10 1
669 11 669
1338 12 669
2007 13 669
2676 14 669
3345 15 669
4014 16 669
4683 17 669
5352 18 669
6021 19 669
6690 20 669
7359 21 669
8028 22 669
8697 23 669
9366 24 669
10035 25 669
10704 26 669
11373 27 669
12042 28 669
12711 29 669
13380 30 669
14049 31 669
So for every new first number there are 669 seeds which give that first number. Because the second and third numbers are different we are still generating unique internal states. I think we would have to understand much more about the default_random_engine
in order to understand what is special about 669 (which can be factored into 3 and 223).
Given this, it's clear why the chrono
and random_device
libraries work better: the seeds they generate are always more than 669 apart. Keep in mind that even if the first number is the same what matters in many programs is that the sequence of numbers generated by distinct.
Using std::default_random_engine is like saying "Surprise me!" in a bad restaurant. The only thing you know for certain is that the result will be poor - since the generators provided by <random>
are all deficient - but you won't even know which particular defects you have to deal with.
The Mersenne Twister can be a decent choice if - and only if - it is properly seeded, and therein lies the rub. Ideally, every bit of the seed should affect every bit of the resulting generator state with equal probability; as you discovered, that is just not the case in common implementations of std::mersenne_twister_engine.
Mersenne Twisters are normally initialised with the output of a simpler PRNG that has in turn been seeded by whatever entropy is available. This effectively stretches the seed entropy of the simpler PRNG over the huge state of the twister. The makers of the standard thoughtfully provided the seed_seq interface for this purpose; however, it seems that the library does not contain any adapters for using a generator as a seed sequence.
There is also the discrepancy between two different conceptions of seeding. On the generator side, the seeding function should take the entropy that was passed in and map it faithfully onto the generator state, ensuring that no entropy is lost in the process. On the user side, it's "Take these numbers and give me wildly different sequences", where 'these numbers' is { 1, 2, 3, ... } or clock() output.
In other words, the seed entropy is offered in a form that is not suitable for initialising generator state directly; small seed differences give small state differences. This is especially problematic with huge lagged generators like the Mersenne Twister or the lagged Fibonacci that powers the std::ranluxXX generators.
A bit-mixing function - a bijective function where every bit of the output depends on every bit of the input with equal probability - can help with making seeds like 1, 2, 3 or clock() output more useful for seeding. The murmur hash mixer comes close to this ideal, by achieving almost perfect diffusion (32-bit version shown):
uint32_t murmur_mix32 (uint32_t x)
{
x ^= x >> 16;
x *= 0x85EBCA6B;
x ^= x >> 13;
x *= 0xC2B2AE35;
x ^= x >> 16;
return x;
}
The function is bijective, hence it does not lose any entropy at all. This means you can use it to improve any seed without any danger of making things worse.
Another quick fix - without the effort of making a seed_seq - is to call discard() on the generator with a parameter that depends on the (murmur-mixed) seed. However, the effect on huge generators like the Mersenne Twister is somewhat limited, since their state evolves extremely slowly and they need hundreds of thousands of iterations for full recovery from deficient states.
the seed you are using could be introducing a bias, if using a different seed yields the same results then the generator itself is not properly written.
I would suggest testing with different seeds to draw a conclusion.