Should I call reset() on my C++ std random distrib

2019-01-20 08:33发布

问题:

I would like to wrap the random number distributions from the C++11 standard library with simple functions that take as arguments the distribution's parameters and a generator instance. For example:

double normal(double mean, double sd, std::mt19937_64& generator)
{
    static std::normal_distribution<double> dist;
    return dist(generator, std::normal_distribution<double>::param_type(mean, sd));
}

I want to avoid any hidden state within the distribution object so that each call to this wrapper function only depends on the given arguments. (Potentially, each call to this function could take a different generator instance.) Ideally, I would make the distribution instance static const to ensure this; however, the distribution's operator() is not a const function, so this isn't possible.

My question is this: To ensure there is no hidden state within the distribution, is it 1) necessary and 2) sufficient to call reset() on the distribution each time? For example:

double normal(double mean, double sd, std::mt19937_64& generator)
{
    static std::normal_distribution<double> dist;
    dist.reset();
    return dist(generator, std::normal_distribution<double>::param_type(mean, sd));
}

(Overall, I'm confused about the purpose of the reset() function for the random distributions... I understand why the generator would need to be reset/reseeded at times, but why would the distribution object need to be reset?)

回答1:

To ensure there is no hidden state within the distribution, is it 1) necessary

Yes.

and 2) sufficient to call reset() on the distribution each time?

Yes.

You probably don't want to do this though. At least not on every single call. The std::normal_distribution is the poster-child for allowing distributions to maintain state. For example a popular implementation will use the Box-Muller transformation to compute two random numbers at once, but hand you back only one of them, saving the other for the next time you call. Calling reset() prior to the next call would cause the distribution to throw away this already valid result, and cut the efficiency of the algorithm in half.



回答2:

Some distributions have internal state. If you interfere with how the distribution works by constantly resetting it you won't get properly distributed results. This is just like calling srand() before every call to rand().



回答3:

Calling reset() on a distribution object d has the following effect:

Subsequent uses of d do not depend on values produced by any engine prior to invoking reset.

(an engine is in short a generator that can be seeded).

In other words, it clears any "cached" random data that the distribution object has stored and that depends on output that it has previously drawn from an engine.

So, if you want to do that then you should call reset(). The main reason I can think of that you would want to do that is when you are seeding your engine with a known value with the intention of producing repeatable pseudo-random results. If you want the results from your distribution object to also be repeatable based on that seed, then you need to reset the distribution object (or create a new one).

Another reason I can think of is that you are defensively reseeding the generator object because you fear that some attacker may gain partial knowledge of its internal state (as for example Fortuna does). To over-simplify, you can imagine that the quality/security of the generator's data diminishes over time, and that reseeding restores it. Since a distribution object can cache arbitrary amounts of data from the generator, there will be an arbitrary delay between increasing the quality/security of the output of the generator, and increasing the quality/security of the output of the distribution object. Calling reset on the distribution object avoids this delay. But I won't swear to this latter use being correct, because it gets into the realms where I prefer not to make my own judgement about what is secure, if I can possibly rely on peer-reviewed work by an expert :-)

With regard to your code in particular -- if you don't want the output to depend on previous use of the same dist object with different generator objects, then calling reset() would be the way to do that. But I think it's unlikely that calling reset on a distribution object and then using it with new parameters will be any cheaper than constructing a new distribution object with those parameters. So using a static local object seems to me to make your function non-thread-safe for no benefit: you could create a new distribution object each time and the code would likely be no worse. There are reasons for the design in the standard, and you're expected to use a distribution object repeatedly with the same generator. The function you've written, cutting the distribution object out of the interface, discards the benefits of that part of the design in the standard.