How to “check” if a function really gives a random

2019-06-14 23:47发布

问题:

How can one be sure that a function is really random or as close to the notion as possible? Also, what is the distinction between random and pseudo-random? Finally, what algorithms/sources can be used to generate random numbers?

P.S: Also asking this because a MySQL statement using ORDER BY RAND() LIMIT 1 isn't giving convincing results.

回答1:

Aloha!

There are several methods and tools for testing for randomness. These are applied on a set of numbers collected from the generator to be tested. That is, you test the generator based on a set of data generated.

In computing, esp for IT-security we normally want to have a generator that conforms to a uniform random process. There are many different processes, but I'm guessing that it is a uniform process you are aiming for.

NIST has published several documents with recommendations on both pseudo random number generators as well how to test them. Look at NIST documents SP 800-22 and SP 800-20.

As somebody else pointed out. If you want a True Random Number Generator (TRNG) you need to gather physical entropy. Examples of such sources are radioactive decay, cosmic radiation, lava lamps etc. Preferably you want sources that are hard to manipulate. IETF has an RFC that have some good recommendations, see RFC 4086 - Source of Randomness for Security: http://tools.ietf.org/html/rfc4086

What you normally do is to collect entropy from one ore more (preferably more than one) source. The collected data is then filtered (whitening) and finally used to periodically seed a good PRNG. With different seeds, naturally.

This is how most modern good random generators works. An entropy collector feeding a PRNG created using cryptographic primitives such as symmetric ciphers (AES for example) or hash functions. See for example the random generator Yarrow/Fortuna by Schneier, which in modified form is used in FreeBSD.

Coming back to your question about testing. As somebody pointed out Marsaglia have produced a good set of tests, which was codified in the DIEHARD tests. There are now an even more exapnded set of tests in the Dieharder tests: http://www.phy.duke.edu/~rgb/General/dieharder.php

Dieharder is a good tool that will give you good confidence that the huge pile of numbers supplied to it (collected from your generator) is random (with good quality) or not. Running Dieharder is easy, but will take some time.

In situ testing of randomness is hard. You don't normally want to implement Dieharder in your system. What you can do is implement some simple detectors that should detect patholigical cases. I usually suggest:

  • Equal value length. A simple counter that is reset whenever two consequtive values generated by the RNG differs. And then you need to define a threshold when you think the counter shows that the RNG is broken. If you see 10 million equal values and the value space is greater that one value (the one you see) your RNG is probably not working all that well. Esp if the value are seeing is one of the edge values. For example 0x00000.... or 0xfffff...

  • Median value. If you after generating a million values and have a uniform distribution have a median value that is heavily leaning towards one of the value space edges, not close to the middle, someting is probably also amiss.

  • Variance. If you after generating million of values haven't seen values close to the MIN and MAX of the value space, but instead have a narrow generated value space, then something is also amiss.

Finally. Since you hopefully are using a good PRNG (based on AES for example), the in situ-tests suggested might instead be applied on the entropy source.

I hope that helped in some ways.



回答2:

The thing about being random is that you can't tell if the return from a random function is random or not.

...or...

Proper random uses something that can truly be random, such as white noise. Pseudo random numbers are generally calculated from mathematical formulae or precomputed tables. The Linear congruential generator is a popular method of generating them.

To get a real random number, you generally want to interface with an outside source where something has been generated organically. This is called a True Random Number Generator.



回答3:

There are statistical tests you can apply to see how likely it is that a given sequence of numbers were independent, identically distributed (iid) random variables.

Take a look at A Current View of Random Number Generators by George Marsaglia. In particular, take a look at sections 6-12. This provides an introduction to such tests followed by several that you can apply.



回答4:

True, We can not guarantee the random number is actually a random .
about pseudo-random numbers : yes they just seems to be random ( Originally used in cryptography) (pseudo random functions ), when sending encrypted text and the evil in between traps the message thinks that the encrypted text he got is random, but the message was calculated from some function, moreover you will get the same message using the same function and key ( if any , so no-where they are not random, just look like random because you can not create the original text/number from which it generate. Such as hash functions(md5,sha1) and encryption techniques ( des,aes etc ).



回答5:

For the number to be random, it must not be possible to predict it. So, any algorithm that generates "random" numbers generates pseudo-random numbers, as it is always possible to generate the same sequence of "random" numbers, using prievously used seed or value that is used during "randomizing". Truly random number can be generated by for example dice roll, but not computer algorithm.



回答6:

The theoretical computer science teaches that a computer is a deterministic machine. Every algorithm always runs the same way, so you have to vary your seed. But where should a computer get a random seed from? From an external device? The CPU temperature (which would not vary much)?



回答7:

To test a function that returns random numbers you should call it many times and see how many times each number is returned.

For example

For i := 1 to 1000000 do // Test the function 1.000.000 times
begin
   RandomNumber := Rand(9); // Random numbers from 0 to 9
   case RandomNumber of
      1 : Returned0 := Returned0 + 1;
      1 : Returned1 := Returned1 + 1;
      1 : Returned2 := Returned2 + 1;
      1 : Returned3 := Returned3 + 1;
      1 : Returned4 := Returned4 + 1;
      1 : Returned5 := Returned5 + 1;
      1 : Returned6 := Returned6 + 1;
      1 : Returned7 := Returned7 + 1;
      1 : Returned8 := Returned8 + 1;
      1 : Returned9 := Returned9 + 1;
   end;
end

WriteLn('0: ', Returned0);
WriteLn('1: ', Returned1);
WriteLn('2: ', Returned2);
WriteLn('3: ', Returned3);
WriteLn('4: ', Returned4);
WriteLn('5: ', Returned5);
WriteLn('6: ', Returned6);
WriteLn('7: ', Returned7);
WriteLn('8: ', Returned8);
WriteLn('9: ', Returned9);

A perfect output should be equal numbers for each random output. Something like:

0: 100000
1: 100000
2: 100000
3: 100000
4: 100000
5: 100000
6: 100000
7: 100000
8: 100000
9: 100000