I have a pseudo random number generator (PRNG) class that I want to unit test. There are two approaches:
- write a test case that takes a large amount of samples and test whether they are properly distributed. This approach may lead to a fairly long execution time for the test case;
- calculate a small series of samples 'by hand' and verify if the PRNG algorithm reproduces it. This approach may lead to a not random sequence being generated without being noticed;
I would say that the first approach is not really unit testing because it does not perform a white box test of the generator, but on the other hand it properly tests the responsibility of the class. The second approach is more like a real unit test, focusing on the algorithm, but it does not provide as much evidence as to whether the class fulfills its responsibility.
Which approach do you prefer, and why?
For testing a PRNG, I would use ENT which is a suite of statistical tests that will tell you how well your PRNG performs. I suppose this is approach 1.
I believe your first point (#1) gets more at testing the quality of the random numbers generated, which is dictated by the algorithm being used. The second point (#2) gets more at testing the implementation of the algorithm. If you designed the algorithm, both tests are important. If you implemented an algorithm of demonstrated performance, #2 should be sufficient. Although, I would probably test more than a few seeds and the sequences that resulted using some knowledge of the structure of your particular generator.
Way back in school, I was developing a random number generator for a simulation assignment, and needed some way to identify the non-randomness.
I got the bright idea of taking two random numbers and graphing them (x,y). It's amazing how the human brain can detect non-random patterns. ("Random Pattern" is an oxymoron.)
I tweaked the PRNG to remove the striations and starbursts that appeared on the graph, then plotted (log(x), log(y)) for a whole new perspective and repeated the process.
Later, I was forced to take statistics and learned that you can do weird math to quantify randomness.
Unless you are implementing a given PNRG algorithm there is no way to tell that the numbers are random, thats the nature of randomness. yeah on average as you number generator goes towards infinity it will even out, but you aren't going to test an infinite number of iterations.
If you are implementing a known algorithm, check the first few thousand numbers match the results they provide given a set of seeds. there is no way to be sure though as the number of possible seeds is infinite.
You can't even proove mathmatically that a sequence of numbers is random...
XKCD
DILBERT
Gets modded down...
Strictly, there's no way to test if the random generator is really random :-) First approach gives you the knowledge of can it keed distribution proper or not for a fixed amount of samples only, no matter how big this ammount is. The second approach can suport knowlege of does it behave like an algorythm, but again for a fixed ammount of samples.
The best you can do - use both.
You may find some of the responses to this question useful.
Basically, you can't "prove" that the RNG is random, but you can perform various tests to improve your confidence that it is. These tests vary in complexity. Diehard is comprehensive, but it doesn't really offer a yes/no answer, more like a couple of hundred "maybes". At the other end of the spectrum, it's pretty simple to generate a sequence of values (10,000 at least) and then check whether the mean and standard deviaton/variance are as expected.