Deciding between random_device and seed_seq to gen

2020-06-28 07:47发布

When writing code that requires multiple independent random number distributions/sequences (example below with two), it seems that there are two typical ways to implement (pseudo-)random number generation. One is simply using a random_device object to generate two random seeds for the two independent engines:

std::random_device rd;
std::mt19937 en(rd());
std::mt19937 en2(rd());
std::uniform_real_distribution<> ureald{min,max};
std::uniform_int_distribution<> uintd{min,max};

The other involves using the random_device object to create a seed_seq object using multiple "sources" of randomness:

// NOTE: keeping this here for history, but a (hopefully) corrected version of
// this implementation is posted below the edit
std::random_device rd;
std::seed_seq seedseq{rd(), rd(), rd()}; // is there an optimal number of rd() to use?
std::vector<uint32_t> seeds(5);
seedseq.generate(seeds.begin(), seeds.end());
std::mt19937 en3(seeds[0]);
std::mt19937 en4(seeds[1]);
std::uniform_real_distribution<> ureald{min,max};
std::uniform_int_distribution<> uintd{min,max};

Out of these two, is there a preferred method? Why? If it is the latter, is there an optimal number of random_device "sources" to use in generating the seed_seq object?

Are there better approaches to random number generation than either of these two implementations I've outlined above?

Thank you!


Edit

(Hopefully) corrected version of seed_seq implementation for multiple distributions:

std::random_device rd;
std::seed_seq seedseq1{rd(), rd(), rd()}; // is there an optimal number of rd() to use?
std::seed_seq seedseq2{rd(), rd(), rd()};
std::mt19937 en3(seedseq1);
std::mt19937 en4(seedseq2);
std::uniform_real_distribution<> ureald{min,max};
std::uniform_int_distribution<> uintd{min,max};

标签: c++ c++11 random
1条回答
够拽才男人
2楼-- · 2020-06-28 08:26

std::seed_seq is generally intended to be used if you don't trust the default implementation to properly initialize the state of the engine you're using.

In many ≥C++11 implementations, std::default_random_engine is an alias for std::mt19937, which is a specific variant of the Mersenne Twister Pseudorandom Number Generation algorithm. Looking at the specification for std::mt19937, we see that it has a state of size 624 unsigned integers, which is enough to hold the 19937 bits of state it is intended to encompass (which is how it gets its name). Traditionally, if you seed it with only a single uint32_t value (which is what you would get from calling rd() once, if rd is a std::random_device object), then you're leaving the vast majority of its state uninitialized.

Now, the good news for anyone about to panic about their poorly-seeded Mersenne Twister engines is that if you construct a std::mt19937 with a single uint32_t value (like std::default_random_engine engine{rd()};), the implementation is required to initialize the rest of the state by permutating the original seed value, so while a single invocation of rd() yields a limited range of actual differing engine states, it's still sufficient to at least properly initialize the engine. This will yield a "Good Quality" random number generator.

But if you're worried about the engine not being properly seeded, either for cryptographic reasons (though note that std::mt19937 itself is NOT cryptographically secure!) or simply for statistical reasons, you can use a std::seed_seq to manually specify the entire state, using rd() to fill in each value, so that you can guarantee to a relative degree of confidence that the engine is properly seeded.

For casual use, or scenarios where it's not strictly necessary to achieve high quality random numbers, simply initializing with a single call to std::random_device::operator() is fine.

If you want to use a std::seed_seq, make sure you set it up correctly (the example in your original code is definitely not correct, at least for std::mt19937, and would actually yield much worse results than simply using rd()!). This post on CodeReview contains code which has been vetted properly.

Edit:

For the predefined templates of Mersenne Twister, the state size is always 19968 bits, which is slightly more than what it actually needs, but also the smallest value that can fully represent the range using uint32_t values. This works out to 624 Words of 32-bits each. So if you plan to use a Seed Sequence, you would correctly initialize it with 624 invocations to rd():

//Code copied from https://codereview.stackexchange.com/questions/109260/seed-stdmt19937-from-stdrandom-device
std::vector<uint32_t> random_data(624);
std::random_device source;
std::generate(random_data.begin(), random_data.end(), std::ref(source));
std::seed_seq seeds(random_data.begin(), random_data.end());
std::mt19937 engine(seeds);
//Or:
//std::mt19937_64 engine(seeds);

If you're working with a non-standard instantiation of std::mersenne_twister_engine, the state size needed for that specific situation can be queried by multiplying its state_size by its word_size and then dividing by 32.

using mt_engine = std::mersenne_twister_engine</*...*/>;
constexpr size_t state_size = mt_engine::state_size * mt_engine::word_size / 32;
std::vector<uint32_t> random_data(state_size);
std::random_device source;
std::generate(random_data.begin(), random_data.end(), std::ref(source));
std::seed_seq seeds(random_data.begin(), random_data.end());
mt_engine engine (seeds);

For other engine types, you'll need to evaluate them on a case-by-case basis. std::linear_congruential_engine and its predefined variants use a single integer of its word size, so they only require a single invocation of rd() to initialize, and thus Seed Sequences are unnecessary. I'm not sure how std::subtract_with_carry_engine or its associated-by-use std::discard_block_engine work, but it seems like they also only contain a single Word of state.

查看更多
登录 后发表回答