I seem to see many answers in which someone suggests using <random>
to generate random numbers, usually along with code like this:
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<> dis(0, 5);
dis(gen);
Usually this replaces some kind of "unholy abomination" such as:
srand(time(NULL));
rand()%6;
We might criticize the old way by arguing that time(NULL)
provides low entropy, time(NULL)
is predictable, and the end result is non-uniform.
But all of that is true of the new way: it just has a shinier veneer.
rd()
returns a singleunsigned int
. This has at least 16 bits and probably 32. That's not enough to seed MT's 19937 bits of state.Using
std::mt19937 gen(rd());gen()
(seeding with 32 bits and looking at the first output) doesn't give a good output distribution. 7 and 13 can never be the first output. Two seeds produce 0. Twelve seeds produce 1226181350. (Link)std::random_device
can be, and sometimes is, implemented as a simple PRNG with a fixed seed. It might therefore produce the same sequence on every run. (Link) This is even worse thantime(NULL)
.
Worse yet, it is very easy to copy and paste the foregoing code snippets, despite the problems they contain. Some solutions to the this require acquiring largish libraries which may not be suitable to everyone.
In light of this, my question is How can one succinctly, portably, and thoroughly seed the mt19937 PRNG in C++?
Given the issues above, a good answer:
- Must fully seed the mt19937/mt19937_64.
- Cannot rely solely on
std::random_device
ortime(NULL)
as a source of entropy. - Should not rely on Boost or other libaries.
- Should fit in a small number of lines such that it would look nice copy-pasted into an answer.
Thoughts
My current thought is that outputs from
std::random_device
can be mashed up (perhaps via XOR) withtime(NULL)
, values derived from address space randomization, and a hard-coded constant (which could be set during distribution) to get a best-effort shot at entropy.std::random_device::entropy()
does not give a good indication of whatstd::random_device
might or might not do.
Here's my own stab at the question:
The idea here is to use XOR to combine many potential sources of entropy (fast time, slow time,
std::random-device
, static variable locations, heap locations, function locations, library locations, program-specific values) to make a best-effort attempt at initializing the mt19937. As long as at least once source is "good", the result will be at least that "good".This answer is not as short as would be preferable and may contain one or more mistakes of logic. So I'm considering it a work in progress. Please comment if you have feedback.
A given platform might have a source of entropy, such as
/dev/random
. Nanoseconds since the Epoch withstd::chrono::high_resolution_clock::now()
is probably the best seed in the Standard Library.I previously have used something like
(uint64_t)( time(NULL)*CLOCKS_PER_SEC + clock() )
to get more bits of entropy for applications that aren’t security-critical.In a sense, this can't be done portably. That is, one can conceive a valid fully-deterministic platform running C++ (say, a simulator which steps the machine clock deterministically, and with "determinized" I/O) in which there is no source of randomness to seed a PRNG.
I would argue the greatest flaw with
std::random_device
is the that it is allowed a deterministic fallback if no CSPRNG is available. This alone is a good reason not to seed a PRNG usingstd::random_device
, since the bytes produced may be deterministic. It unfortunately doesn't provide an API to find out when this happens, or to request failure instead of low-quality random numbers.That is, there is no completely portable solution: however, there is a decent, minimal approach. You can use a minimal wrapper around a CSPRNG (defined as
sysrandom
below) to seed the PRNG.Windows
You can rely on
CryptGenRandom
, a CSPRNG. For example, you may use the following code:Unix-Like
On many Unix-like systems, you should use /dev/urandom when possible (although this is not guaranteed to exist on POSIX-compliant systems).
Other
If no CSPRNG is available, you might choose to rely on
std::random_device
. However, I would avoid this if possible, since various compilers (most notably, MinGW) implement it with as a PRNG (in fact, producing the same sequence every time to alert humans that it's not properly random).Seeding
Now that we have our pieces with minimal overhead, we can generate the desired bits of random entropy to seed our PRNG. The example uses (an obviously insufficient) 32-bits to seed the PRNG, and you should increase this value (which is dependent on your CSPRNG).
Comparison To Boost
We can see parallels to boost::random_device (a true CSPRNG) after a quick look at the source code. Boost uses
MS_DEF_PROV
on Windows, which is the provider type forPROV_RSA_FULL
. The only thing missing would be verifying the cryptographic context, which can be done withCRYPT_VERIFYCONTEXT
. On *Nix, Boost uses/dev/urandom
. IE, this solution is portable, well-tested, and easy-to-use.Linux Specialization
If you're willing to sacrifice succinctness for security,
getrandom
is an excellent choice on Linux 3.17 and above, and on recent Solaris.getrandom
behaves identically to/dev/urandom
, except it blocks if the kernel hasn't initialized its CSPRNG yet after booting. The following snippet detects if Linuxgetrandom
is available, and if not falls back to/dev/urandom
.OpenBSD
There is one final caveat: modern OpenBSD does not have
/dev/urandom
. You should use getentropy instead.Other Thoughts
If you need cryptographically secure random bytes, you should probably replace the fstream with POSIX's unbuffered open/read/close. This is because both
basic_filebuf
andFILE
contain an internal buffer, which will be allocated via a standard allocator (and therefore not wiped from memory).This could easily be done by changing
sysrandom
to:Thanks
Special thanks to Ben Voigt for pointing out
FILE
uses buffered reads, and therefore should not be used.I would also like to thank Peter Cordes for mentioning
getrandom
, and OpenBSD's lack of/dev/urandom
.There's nothing wrong with seeding by using time, assuming you don't need it to be secure (and you didn't say this was necessary). The insight is that you can use hashing to fix the non-randomness. I've found this works adequately in all cases, including and in-particular for heavy Monte Carlo simulations.
One nice feature of this approach is that it generalizes to initialization from other not-really-random sets of seeds. For example, if you want each thread to have its own RNG (for threadsafety), you can just initialize based on hashed thread ID.
The following is a SSCCE, distilled from my codebase (for simplicity; some OO support structures elided):
The implementation I am working on takes advantage of the
state_size
property of themt19937
PRNG to decide how many seeds to provide on initialization:I think there is room for improvement because
std::random_device::result_type
could differ fromstd::mt19937::result_type
in size and range so that should really be taken into account.A note about std::random_device.
According to the
C++11(/14/17)
standard(s):This means the implementation may only generate deterministic values if it is prevented from generating non-deterministic ones by some limitation.
The
MinGW
compiler onWindows
famously does not provide non-deterministic values from itsstd::random_device
, despite them being easily available from the Operating System. So I consider this a bug and not likely a common occurrence across implementations and platforms.