Temporal correlations when employing System.Random

This question concerns the origins of temporal correlations one observes with System.Random when one generates successive randoms from successive seeds (where one discards the same number of generators for each seed).

In Using mkStdGen from System.Random to generate random booleans Answer 1 and Using mkStdGen from System.Random to generate random booleans Answer 2 it was suggested (based on the reddit article referenced theirin) that the first few generators should be discarded in order to get sensible results. However I find that irrespective of how many generators one discards, when one looks at the temporal aspect of the distribution one obtains undesirable results if successive random numbers are generated with successive seeds (with one discarding the same number of generators for each seed).

My question is what is it about the algorithm employed in System.Random that results in this temporal correlation between different seeds in the manner described?

If we generate an infinite sequence of random booleans, then the probability P(n) of getting n successive booleans which are of the same value (e.g. the [True,True,True] in [False,True,True,True,False]) is (1/2)^n. As a quick check we have the normalisation condition :

P(1)+P(2)+....P(infty) = (1/2) + (1/2)^2 + ... = 1

The following code :

module Main where
import Data.List
import System.Random

generateNthGenerator startGen 0 = startGen
generateNthGenerator startGen n = generateNthGenerator newGen (n-1)
  where newGen = snd $ ((random startGen) :: (Bool,StdGen)) 

better_mkStdGen generation seed = 
  generateNthGenerator (mkStdGen seed) generation

randomNums generation = 
  map (fst . random . (better_mkStdGen generation)) [0 .. maxBound] :: [Bool]
-- e.g. [True,True,False,False,False,True,True,True,False,False] 

sortedLengthOfConsecutives num randList = 
  sort $ map length $ take num $ group randList

frequencyOfConsecutives sortedLengthOfCons = 
  map (\x -> (head x, length x)) $ group sortedLengthOfCons

results = frequencyOfConsecutives 
            $ sortedLengthOfConsecutives 10000
                $ randomNums 10

main = do
  print results -- [(8,1493),(9,8507)]

generates each successive boolean using generators from the consecutive seed, discarding 10 generators before using the resulting random result. A sequence of 10000 random numbers is generated, and so we expect roughly 5000 booleans to be followed by the opposite boolean (e.g. [True] in [False,True,False,False]), for there to be 2500 booleans which are followed by the same boolean but then followed by the opposed boolean (e.g. [True,True] in [False,True,True,False]), about 1250 booleans which group into 3s, etc.

So from the code above we get 1493 8-groups and 8507 9-groups. This is not what is expected, and we get similar results irrespective of how many generators are discarded (in the example above the number of generators discarded for each seed is 10). [Note when we do the same experiment with tf-random we don't get the same behaviour, see later on].

If we instead generate successive booleans using the previously generated generator (which is I guess the fashion in which it was originally designed to be used, since random itself returns a new generator), we seem to get more reasonable results :

module Main where
import Data.List
import System.Random

generateRandomInner gen = 
  let (randBool, newGen) = (random gen)::(Bool,StdGen)
  in randBool:(generateRandomInner newGen)

generateRandoms seed =
  let (randBool, newGen) = (random $ mkStdGen seed)::(Bool,StdGen) 
  in randBool:(generateRandomInner newGen)

seed = 0

randomNums = generateRandoms seed

sortedLengthOfConsecutives num randList = 
  sort $ map length $ take num $ group randList

frequencyOfConsecutives sortedLengthOfCons = 
  map (\x -> (head x, length x)) $ group sortedLengthOfCons

results = frequencyOfConsecutives 
            $ sortedLengthOfConsecutives 10000
                $ randomNums
main = do 
  print results
  --[(1,4935),(2,2513),(3,1273),(4,663),(5,308),
  -- (6,141),(7,86),(8,45),(9,16),(10,12),(11,6),
  -- (12,1),(13,1)]

So we get 4935 singletons (roughly equals 0.5*10000), 2513 duples (roughly equals 0.5^2*10000), 1273 triples (roughly equals 0.5^3*10000) etc, which is what we expect.

So it seems that if we generate (via System.Random) a random sequence where each successive random is generated with the successive seed, where we discard the same number of generators for each seed, an undesirable correlation persists irrespective number of generators discarded.

What is it about the algorithmic properties of the random number generation of System.Random that result in this?

Note that if we employ the failing method above with tf-random (redditt article) then we get the expected results :

module Main where
import Data.List
import System.Random
import System.Random.TF

generateNthGenerator startGen 0 = startGen
generateNthGenerator startGen n = generateNthGenerator newGen (n-1)
  where newGen = snd $ ((random startGen) :: (Bool,TFGen)) 

better_mkStdGen generation seed = 
  generateNthGenerator (seedTFGen (0,0,0,seed)) generation

randomNums generation = 
  map (fst . random . (better_mkStdGen generation)) [0 .. maxBound] :: [Bool]
-- e.g. [True,True,False,False,False,True,True,True,False,False] 

sortedLengthOfConsecutives num randList = 
  sort $ map length $ take num $ group randList

frequencyOfConsecutives sortedLengthOfCons = 
  map (\x -> (head x, length x)) $ group sortedLengthOfCons

results = frequencyOfConsecutives 
            $ sortedLengthOfConsecutives 10000
                $ randomNums 10

main = do
  print results
  -- [(1,4867),(2,2573),(3,1243),(4,646),(5,329),
  -- (6,176),(7,80),(8,43),(9,26),(10,10),(11,4),
  -- (12,2),(19,1)]

i.e. 50% are singletons, 25% are duples (groups of 2), etc...

Let's begin by looking at what the code says, and we can get there.

First off, random as applied to Bool is equivalent to:

randomB :: StdGen -> (Bool, StdGen)
randomB g = let (i, g') = next g in (i `mod` 2 == 1, g')

In fact, if I replace random with randomB where that's appropriate in your program, I get identical results. The point is that for determining booleans, all we care about is whether the next Int value is even or odd.

Next, let's look at the definition of StdGen:

data StdGen 
 = StdGen Int32 Int32

So two Int32s are the state. Let's see how they're initialized with mkStdGen and how they're adjusted with next:

mkStdGen :: Int -> StdGen -- why not Integer ?
mkStdGen s = mkStdGen32 $ fromIntegral s

mkStdGen32 :: Int32 -> StdGen
mkStdGen32 s
 | s < 0     = mkStdGen32 (-s)
 | otherwise = StdGen (s1+1) (s2+1)
      where
    (q, s1) = s `divMod` 2147483562
    s2      = q `mod` 2147483398

...

stdNext :: StdGen -> (Int, StdGen)
-- Returns values in the range stdRange
stdNext (StdGen s1 s2) = (fromIntegral z', StdGen s1'' s2'')
    where   z'   = if z < 1 then z + 2147483562 else z
        z    = s1'' - s2''

        k    = s1 `quot` 53668
        s1'  = 40014 * (s1 - k * 53668) - k * 12211
        s1'' = if s1' < 0 then s1' + 2147483563 else s1'

        k'   = s2 `quot` 52774
        s2'  = 40692 * (s2 - k' * 52774) - k' * 3791
        s2'' = if s2' < 0 then s2' + 2147483399 else s2'

Note two interesting things:

The way s2 is initialized guarantees that it will be 1 unless you send a very high number to mkStdGen, in which case it will be 2 (there are fewer than 200 values in the Int32 range that will initialize s2 to 2.
The two halves of the state are kept distinct - the updated s2 depends only on the previous s2, not on the previous s1, and vice versa.

As a consequence, if you examine the generators that you get out of a certain fixed number of generations passed to better_mkStdGen, the second half of their state will always be identical.

Try it by adding this to your program:

print $ map (dropWhile (/= ' ') . show . better_mkStdGen 10) [0 .. 20]

So then the question is, why the mixing function in s1 fails to mix the last bit properly. Note that the way it's written, s1' and k will have the same parity as s1, so the s1 state only has different parity from the previous s1 state if s1' ends up being less than zero.

At this point I need to handwave a bit and say that the way s1' is computed means that if two initial values for s1 are close to each other, the two values for s1' will also be close together, and in general will only be 40014 times as far apart as they were initially, which in the range we allow for s1 makes neighboring values quite likely to end up on the same side of zero.