This question concerns the origins of temporal correlations one observes with System.Random
when one generates successive randoms from successive seeds (where one discards the same number of generators for each seed).
In Using mkStdGen from System.Random to generate random booleans Answer 1 and Using mkStdGen from System.Random to generate random booleans Answer 2 it was suggested (based on the reddit article referenced theirin) that the first few generators should be discarded in order to get sensible results. However I find that irrespective of how many generators one discards, when one looks at the temporal aspect of the distribution one obtains undesirable results if successive random numbers are generated with successive seeds (with one discarding the same number of generators for each seed).
My question is what is it about the algorithm employed in System.Random
that results in this temporal correlation between different seeds in the manner described?
If we generate an infinite sequence of random booleans, then the probability P(n)
of getting n
successive booleans which are of the same value (e.g. the [True,True,True]
in [False,True,True,True,False]
) is (1/2)^n
. As a
quick check we have the normalisation condition :
P(1)+P(2)+....P(infty) = (1/2) + (1/2)^2 + ... = 1
The following code :
module Main where
import Data.List
import System.Random
generateNthGenerator startGen 0 = startGen
generateNthGenerator startGen n = generateNthGenerator newGen (n-1)
where newGen = snd $ ((random startGen) :: (Bool,StdGen))
better_mkStdGen generation seed =
generateNthGenerator (mkStdGen seed) generation
randomNums generation =
map (fst . random . (better_mkStdGen generation)) [0 .. maxBound] :: [Bool]
-- e.g. [True,True,False,False,False,True,True,True,False,False]
sortedLengthOfConsecutives num randList =
sort $ map length $ take num $ group randList
frequencyOfConsecutives sortedLengthOfCons =
map (\x -> (head x, length x)) $ group sortedLengthOfCons
results = frequencyOfConsecutives
$ sortedLengthOfConsecutives 10000
$ randomNums 10
main = do
print results -- [(8,1493),(9,8507)]
generates each successive boolean using generators from the consecutive seed, discarding 10 generators before using the resulting random result. A sequence of 10000 random numbers is generated, and so we expect roughly 5000 booleans to be followed by the opposite boolean (e.g. [True]
in [False,True,False,False]
), for there to be 2500 booleans which are followed by the same boolean but then followed by the opposed boolean (e.g. [True,True]
in [False,True,True,False]
), about 1250 booleans which group into 3s, etc.
So from the code above we get 1493 8-groups and 8507 9-groups. This is not what is expected, and we get similar results irrespective of how many generators are discarded (in the example above the number of generators discarded for each seed is 10). [Note when we do the same experiment with tf-random
we don't get the same behaviour, see later on].
If we instead generate successive booleans using the previously generated generator (which is I guess the fashion in which it was originally designed to be used, since random
itself returns a new generator), we seem to get more reasonable results :
module Main where
import Data.List
import System.Random
generateRandomInner gen =
let (randBool, newGen) = (random gen)::(Bool,StdGen)
in randBool:(generateRandomInner newGen)
generateRandoms seed =
let (randBool, newGen) = (random $ mkStdGen seed)::(Bool,StdGen)
in randBool:(generateRandomInner newGen)
seed = 0
randomNums = generateRandoms seed
sortedLengthOfConsecutives num randList =
sort $ map length $ take num $ group randList
frequencyOfConsecutives sortedLengthOfCons =
map (\x -> (head x, length x)) $ group sortedLengthOfCons
results = frequencyOfConsecutives
$ sortedLengthOfConsecutives 10000
$ randomNums
main = do
print results
--[(1,4935),(2,2513),(3,1273),(4,663),(5,308),
-- (6,141),(7,86),(8,45),(9,16),(10,12),(11,6),
-- (12,1),(13,1)]
So we get 4935 singletons (roughly equals 0.5*10000), 2513 duples (roughly equals 0.5^2*10000), 1273 triples (roughly equals 0.5^3*10000) etc, which is what we expect.
So it seems that if we generate (via System.Random
) a random sequence where each successive random is generated with the successive seed, where we discard the same number of generators for each seed, an undesirable correlation persists irrespective number of generators discarded.
What is it about the algorithmic properties of the random number generation of
System.Random
that result in this?
Note that if we employ the failing method above with tf-random
(redditt article) then we get the expected results :
module Main where
import Data.List
import System.Random
import System.Random.TF
generateNthGenerator startGen 0 = startGen
generateNthGenerator startGen n = generateNthGenerator newGen (n-1)
where newGen = snd $ ((random startGen) :: (Bool,TFGen))
better_mkStdGen generation seed =
generateNthGenerator (seedTFGen (0,0,0,seed)) generation
randomNums generation =
map (fst . random . (better_mkStdGen generation)) [0 .. maxBound] :: [Bool]
-- e.g. [True,True,False,False,False,True,True,True,False,False]
sortedLengthOfConsecutives num randList =
sort $ map length $ take num $ group randList
frequencyOfConsecutives sortedLengthOfCons =
map (\x -> (head x, length x)) $ group sortedLengthOfCons
results = frequencyOfConsecutives
$ sortedLengthOfConsecutives 10000
$ randomNums 10
main = do
print results
-- [(1,4867),(2,2573),(3,1243),(4,646),(5,329),
-- (6,176),(7,80),(8,43),(9,26),(10,10),(11,4),
-- (12,2),(19,1)]
i.e. 50% are singletons, 25% are duples (groups of 2), etc...