stl random distributions and portability

2020-01-29 15:10发布

问题:

Why is it that the result of standard distributions isn't mandated to be consistent across implementations? The result of pseudo random number generators is on the other hand mandated to be identical.

For example, the following will almost certainly print something different for every different standard library implementation.

std::mt19937 random {100};
std::normal_distribution<> dist;

std::cout << dist(random);

Say I want to do procedural generation and would like identical starting seeds to result in identical results across platforms and compilers. I can't do it with the stl. I have to "regress" to using boost. Why isn't this a defect?

回答1:

This is not a defect, it is by design. The rationale for this can be found in A Proposal to Add an Extensible Random Number Facility to the Standard Library (N1398) which says (emphasis mine):

On the other hand, the specifications for the distributions only define the statistical result, not the precise algorithm to use. This is different from engines, because for distribution algorithms, rigorous proofs of their correctness are available, usually under the precondition that the input random numbers are (truely) uniformly distributed. For example, there are at least a handful of algorithms known to produce normally distributed random numbers from uniformly distributed ones. Which one of these is most efficient depends on at least the relative execution speeds for various transcendental functions, cache and branch prediction behaviour of the CPU, and desired memory use. This proposal therefore leaves the choice of the algorithm to the implementation. It follows that output sequences for the distributions will not be identical across implementations. It is expected that implementations will carefully choose the algorithms for distributions up front, since it is certainly surprising to customers if some distribution produces different numbers from one implementation version to the next.

This point is reiterated in the implementation defined section which says:

The algorithms how to produce the various distributions are specified as implementation-defined, because there is a vast variety of algorithms known for each distribution. Each has a different trade-off in terms of speed, adaptation to recent computer architectures, and memory use. The implementation is required to document its choice so that the user can judge whether it is acceptable quality-wise.