How to unit test a pseudo random number generator?

2019-01-22 11:50发布

站内文章 / 后端开发

41 0

啃猪蹄的小仙女

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a pseudo random number generator (PRNG) class that I want to unit test. There are two approaches:

write a test case that takes a large amount of samples and test whether they are properly distributed. This approach may lead to a fairly long execution time for the test case;
calculate a small series of samples 'by hand' and verify if the PRNG algorithm reproduces it. This approach may lead to a not random sequence being generated without being noticed;

I would say that the first approach is not really unit testing because it does not perform a white box test of the generator, but on the other hand it properly tests the responsibility of the class. The second approach is more like a real unit test, focusing on the algorithm, but it does not provide as much evidence as to whether the class fulfills its responsibility.

Which approach do you prefer, and why?

回答1:

Get another implementation of the same PRNG algorithm, generate a smallish number of longish test cases based on known seeds, and verify that your implementation of the algorithm matches everyone else's implementations. The more data you test, the more chance it does. If you want to be serious, look into how FIPS validation is done for the algorithm.

There's no need to test whether the output is random, since far more research has been done on the algorithm by others than you are capable of reproducing.

If you have invented your own PRNG algorithm then you have a rather different problem, because quite aside from testing your code you also need to test your new algorithm. There are various things to do -- I think the most important are statistical testing on the output, and peer review by other cryptographers. Basically, though, if you were to design a PRNG algorithm without having enough knowledge in the field to know how to test it, then it will be rubbish.

回答2:

For testing a PRNG, I would use ENT which is a suite of statistical tests that will tell you how well your PRNG performs. I suppose this is approach 1.

回答3:

I would imagine that ultimately you would want both tests - because you want to be sure that both the following hold true:

(1) the numbers are properly distributed and (2) the specific algorithm is working as expected.

Perhaps the first test could only be run occasionally, whereas the second could be use to check that any code changes hadn't broken the algorithm.

回答4:

I believe your first point (#1) gets more at testing the quality of the random numbers generated, which is dictated by the algorithm being used. The second point (#2) gets more at testing the implementation of the algorithm. If you designed the algorithm, both tests are important. If you implemented an algorithm of demonstrated performance, #2 should be sufficient. Although, I would probably test more than a few seeds and the sequences that resulted using some knowledge of the structure of your particular generator.

回答5:

The "randomness" in a pseudo-random number generator is generally expressed as the average number of iterations before a number repeats. There are numerous algorithms that have varying "randomness" and performance. Random.org has a good explanation of some analysis done on their algorithms. Check out the pictures about halfway down the page. It's easy to see in the static images just how random the two algorithms are.

One feature of a PRNG (a true feature, not a bug disguised as a feature) is that it is predictable. For a given seed, the same sequence of numbers should be produced. This is extremely important and necessary for testing and debugging programs that use stochastic (aka, random) methods.

The number sequence should approximate a certain statistical distribution. Test your PRNG by generating a large sequence (say 10^6 numbers) and perform several statistical tests on the sequence, particularly the Chi-Squared test (if the distribution is normal). Make a histogram of your sample and see if it looks like you expect.

If you control how the seed is set, the sequence generated should be the same every time, which is suitable for doing white-box testing. When doing your tests, it's also a good idea to "warm up" the generator by running it for 100 iterations or so before collecting your sample.

回答6:

Here's a CodeProject article that includes an implementation of the Kolmogorov-Smirnov test mentioned in Donald Knuth's volume 2, Seminumerical Algorithms. As InSciTek Jeff mentioned above, there are two issues: testing the algorithm and testing your implementation of the algorithm. The K-S test is likely to find bugs in the implementation, and it's a good start toward testing the quality of the algorithm itself.

回答7:

Plesae be aware: If you have 'invented' your PRNG, you will probably have got it wrong and produced something that has a less than optimal distribution. The basic test for how random your generator is, is the Chi-Square test

回答8:

You may find some of the responses to this question useful.

Basically, you can't "prove" that the RNG is random, but you can perform various tests to improve your confidence that it is. These tests vary in complexity. Diehard is comprehensive, but it doesn't really offer a yes/no answer, more like a couple of hundred "maybes". At the other end of the spectrum, it's pretty simple to generate a sequence of values (10,000 at least) and then check whether the mean and standard deviaton/variance are as expected.

回答9:

Strictly, there's no way to test if the random generator is really random :-) First approach gives you the knowledge of can it keed distribution proper or not for a fixed amount of samples only, no matter how big this ammount is. The second approach can suport knowlege of does it behave like an algorythm, but again for a fixed ammount of samples.

The best you can do - use both.

回答10:

Way back in school, I was developing a random number generator for a simulation assignment, and needed some way to identify the non-randomness.

I got the bright idea of taking two random numbers and graphing them (x,y). It's amazing how the human brain can detect non-random patterns. ("Random Pattern" is an oxymoron.)

I tweaked the PRNG to remove the striations and starbursts that appeared on the graph, then plotted (log(x), log(y)) for a whole new perspective and repeated the process.

Later, I was forced to take statistics and learned that you can do weird math to quantify randomness.

回答11:

Random.org uses this test suit: http://csrc.nist.gov/groups/ST/toolkit/rng/documentation_software.html

You can download the software (unix and mac os x) in: http://csrc.nist.gov/groups/ST/toolkit/rng/documents/sts-2.1.1.zip

The documentation is here: http://csrc.nist.gov/groups/ST/toolkit/rng/documents/SP800-22rev1a.pdf

回答12:

One method is to pipe its output to PractRand.

If PractRand says a PRNG's output is OK, is the PRNG really OK? I'm not qualified to judge, but what I can say is that PRNG is exacting enough that it deemed unsatisfactory the output of various LFSRs and xor-and-shift algorithms I'd found in the literature or online, and deemed OK the output of R. P. Brent's xorgens.

回答13:

Unless you are implementing a given PNRG algorithm there is no way to tell that the numbers are random, thats the nature of randomness. yeah on average as you number generator goes towards infinity it will even out, but you aren't going to test an infinite number of iterations.

If you are implementing a known algorithm, check the first few thousand numbers match the results they provide given a set of seeds. there is no way to be sure though as the number of possible seeds is infinite.

You can't even proove mathmatically that a sequence of numbers is random...

XKCD