I'm working on a program that runs Monte Carlo simulation; specifically, I'm using a Metropolis algorithm. The program needs to generate possibly billions of "random" numbers. I know that the Mersenne twister is very popular for Monte Carlo simulation, but I would like to make sure that I am seeding the generator in the best way possible.
Currently I'm computing a 32-bit seed using the following method:
mt19937_64 prng; //pseudo random number generator
unsigned long seed; //store seed so that every run can follow the same sequence
unsigned char seed_count; //to help keep seeds from repeating because of temporal proximity
unsigned long genSeed() {
return ( static_cast<unsigned long>(time(NULL)) << 16 )
| ( (static_cast<unsigned long>(clock()) & 0xFF) << 8 )
| ( (static_cast<unsigned long>(seed_count++) & 0xFF) );
}
//...
seed = genSeed();
prng.seed(seed);
I have a feeling there are much better ways to assure non-repeating new seeds, and I'm quite sure mt19937_64 can be seeded with more then 32-bits. Does anyone have any suggestions?
How about...
There is some main code that starts the threads and there are copies of a function run in those threads, each copy with it's own Marsenne Twister. Am I correct? If it is so, why not use another random generator in the main code? It would be seeded with time stamp, and send it's consecutive pseudorandom numbers to function instances as their seeds.
As far as I can tell from your comments, it seems that what you are interested in is ensuring that if a process starts several of your simulations at exactly the same time, they will get different seeds.
The only significant problem I can see with your current approach is a race condition: if you are going to start multiple simulations simultaneously, it must be done from separate threads. If it is done from separate threads, you need to update
seed_count
in a thread-safe manner, or multiple simulations could end up with the sameseed_count
. You could simply make it anstd::atomic<int>
to solve that.Beyond that, it just seems more complicated than it has to be. What do you gain by using two separate timers? You could do something as simple as this:
seed_count
.From the comments I understand you want to run several instances of the algorithm, one instance per thread. And given that the seed for each instance will be generated pretty much at the same time, you want to ensure that these seeds are different. If that is indeed what you are trying to solve, then your genSeed function will not necessarily guarantee that.
In my opinion, what you need is a parallelisable random number generator (RNG). What this means, is that you only need one RNG which you instantiate with only one seed (which you can generate with your genSeed) and then the sequence of random numbers that would normally be gerenated in a sequential environment is split in X non-overlapping sequences; where X is the number of threads. There is a very good library which provides these type of RNGs in C++, follows the C++ standard for RNGs, and is called TRNG(http://numbercrunch.de/trng).
Here is a little more information. There are two ways you can achieve non-overlapping sequences per thread. Let's assume that the sequence of random numbers from a single RNG is r = {r(1), r(2), r(3),...} and you have only two threads. If you know in advance how many random numbers you will need per thread, say M, you can give the first M of the r sequence to the first thread, ie {r(1), r(2),..., r(M)}, and the second M to the second thread, ie {r(M+1), r(M+2),...r(2M)}. This technique is called blocksplitting since you split the sequence in two consecutive blocks.
The second way is to create the sequence for the first thread as {r(1), r(3), r(5), ...} and for the second thread as {r(2), r(4), r(6),...}, which has the advantage that you do not need to know in advance how many random numbers you will need per thread. This is called leapfroging.
Note that both methods guarantee that the sequences per thread are indeed non-overlapping. The link I posted above has many examples and the library itself is extremely easy to use. I hope my post helps.
Use
std::random_device
to generate the seed. It'll provide non-deterministic random numbers, provided your implementation supports it. Otherwise it's allowed to use some other random number engine.operator()
ofstd::random_device
returns anunsigned int
, so if your platform has 32-bitint
s, and you want a 64-bit seed, you'll need to call it twice.Another available option is using
std::seed_seq
to seed the PRNG. This allows the PRNG to callseed_seq::generate
, which produces a non-biased sequence over the range [0 ≤ i < 232), with an output range large enough to fill its entire state.I'm calling the
random_device
4 times to create a 4 element initial sequence forseed_seq
. However, I'm not sure what the best practice for this is, as far as length or source of elements in the initial sequence is concerned.Let's recap (comments too), we want to generate different seeds to get independent sequences of random numbers in each of the following occurrences:
1 is solved using time since epoch, 2 is solved with a global atomic counter, 3 is solved with a platform dependent id (see How to obtain (almost) unique system identifier in a cross platform way?)
Now the point is what is the best way to combine them to get a
uint_fast64_t
(the seed type ofstd::mt19937_64
)? I assume here that we do not know a priori the range of each parameter or that they are too big, so that we cannot just play with bit shifts getting a unique seed in a trivial way.A
std::seed_seq
would be the easy way to go, however its return typeuint_least32_t
is not our best choice.A good 64 bits hasher is a much better choice. The STL offers
std::hash
under thefunctional
header, a possibility is to concatenate the three numbers above into a string and then passing it to the hasher. The return type is asize_t
which on 64 machines is very likely to match our requirements.Collisions are unlikely but of course possible, if you want to be sure to not build up statistics that include a sequence more than once, you can only store the seeds and discard the duplicated runs.
A
std::random_device
could also be used to generate the seeds (collisions may still happen, hard to say if more or less often), however since the implementation is library dependent and may go down to a pseudo random generator, it is mandatory to check the entropy of the device and avoid to a use zero-entropy device for this purpose as you will probably break the points above (especially point 3). Unfortunately you can discover the entropy only when you take the program to the specific machine and test with the installed library.The POSIX function
gettimeofday(2)
gives the time with microsecond precision.The POSIX thread function
gettid(2)
returns the ID number of the current thread.You should be able to combine the time in seconds since the epoch (which you are already using), the time in microseconds, and the thread ID to get a seed which is always unique on one machine.
If you also need it to be unique across multiple machines, you could consider also getting the hostname, the IP address, or the MAC address.
I would guess that 32 bits is probably enough, since there are over 4 billion unique seeds available. Unless you are running billions of processes, which doesn't seem likely, you should be alright without going to 64 bit seeds.