How to understand autocorrelations caused by seedi

2019-06-21 19:30发布

问题:

In response to this question I ran the following VBA experiment:

Sub Test()
    Dim i As Long, A As Variant
    Dim count1 As Long, count2 As Long
    ReDim A(1 To 10000)

    For i = 1 To 10000
        Randomize
        A(i) = IIf(Rnd() < 0.5, 0, 1)
    Next i

    'count how often A(i) = A(i+1)
    For i = 1 To 9999
        If A(i) = A(i + 1) Then count1 = count1 + 1
    Next i

    For i = 1 To 10000
        A(i) = IIf(Rnd() < 0.5, 0, 1)
    Next i

    'count how often A(i) = A(i+1)
    For i = 1 To 9999
        If A(i) = A(i + 1) Then count2 = count2 + 1
    Next i

   Debug.Print "First Loop: " & count1
   Debug.Print "Second Loop: " & count2 & vbCrLf

End Sub

When I saw output like this:

First Loop: 5550
Second Loop: 4976

I was pretty sure that I knew what was happening: VBA was converting the system clock into something of lower resolution (perhaps microsecond) which as a consequence would lead to Randomize sometimes producing identical seeds in two or more passes through the loop. In my original answer I even confidently asserted this. But then I ran the code some more and noticed that the output was sometimes like this:

First Loop: 4449
Second Loop: 5042

The overseeding is still causing a noticeable autocorrelation -- but in the opposite (and unexpected) direction. Successive passes through the loop with the same seed should produce identical outputs, hence we should see successive values agreeing more often than chance would predict, not disagreeing more often than chance would predict.

Curious now, I modified the code to:

Sub Test2()
    Dim i As Long, A As Variant
    Dim count1 As Long, count2 As Long
    ReDim A(1 To 10000)

    For i = 1 To 10000
        Randomize
        A(i) = Rnd()
    Next i

    'count how often A(i) = A(i+1)
    For i = 1 To 9999
        If A(i) = A(i + 1) Then count1 = count1 + 1
    Next i

    For i = 1 To 10000
        A(i) = Rnd()
    Next i

    'count how often A(i) = A(i+1)
    For i = 1 To 9999
        If A(i) = A(i + 1) Then count2 = count2 + 1
    Next i

   Debug.Print "First Loop: " & count1
   Debug.Print "Second Loop: " & count2 & vbCrLf

End Sub

Which always gives the following output:

First Loop: 0
Second Loop: 0

It seems that it isn't the case that successive calls to Randomize sometimes returns the same seed (at least not often enough to make a difference).

But if that isn't the source of the autocorrelation -- what is? And -- why does it sometimes manifest itself as a negative rather than a positive autocorrelation?

回答1:

Partial answer only, fell free to edit and complete.


Well, there is clearly a correlation when you overuse the Randomize function.

I tried the following code, with a conditional formatting (black fill for values >0.5), and there is clearly patterns appearing (try to comment the Randomize to see a more "random" pattern. (best seen with 20 pt columns and 10% zoom)

Function Rndmap()
    Dim i As Long, j As Long
    Dim bmp(1 To 512, 1 To 512) As Long
    For i = 1 To 512
        For j = 1 To 512
            ' Rnd -1 ' uncomment this line to get a big white and black lines pattern.
            Randomize 'comment this line to have a random pattern
            bmp(i, j) = IIf(Rnd() < 0.5, 0, 1)
        Next j
    Next i
    Range(Cells(1, 1), Cells(512, 512)) = bmp
End Function

So while the MSDN states that "Using Randomize with the same value for number does not repeat the previous sequence.", implying that if the Timer returns twice the same value, the Rnd should keep on the same random sequence without reseting, there is still some behind the scene link..

Some screenshots:

Rnd() only:

Using Randomize:

Using Rnd -1 and Randomize:



回答2:

The Randomize method initialises the Rnd function with the current system time as it's seed, you can also specify a number with Randomize to be used as the seed.

I decided to test how long a sequence continues before repeating itself:

Sub randomRepeatTest()
    For i = 1 To 100000
        Randomize
        randomThread = randomThread & Int(9 * Rnd + 1)
        If i Mod 2 = 0 Then
            If Left(randomThread, i / 2) = Right(randomThread, i / 2) Then
                Debug.Print i / 2
                Exit Sub
            End If
        End If
    Next i
End Sub

This sub generates a random sequence of the digits 0 - 9, and as the sequence becomes an even length it is tested to see if the first half of the sequence matches the second half, and if so it outputs the length the sequence reached before repeating. After running it a number of times, and discounting where a digit is repeated twice at the beginning, the result comes out at 256 (nice).

Providing any value to Randomize will still return a result of 256.


We're randomizing Rnd every loop, so what's going on here?

Well as I said at the beginning, if no value is given to Randomize, it will use the system time as the seed. The resolution of this time is something I can't seem to find sourced, however I believe it to be low.

I have tested using the value of timer which returns the time of day in seconds to 2 decimal places (e.g. 60287.81). I have also tried GetTickCount which returns the system active time (starts counting at boot) in milliseconds. Both of these still result in the 256 sequence limit.

So, why when we're randomizing every loop does the sequence repeat? Well the reality is, the code is executed within a millisecond. Essentially, we're providing the same number to randomize every loop, and so we're not actually shuffling the seed.


So, is Rnd more random without Randomize?

I ran the above sub again without Randomize; nothing returned. I upped the loop count to 2,000,000; still nothing.

I've managed to source the algorithm used by the workbook Rand formula, which I believe is the same as Rnd with no initialised seed:

C IX, IY, IZ SHOULD BE SET TO INTEGER VALUES BETWEEN 1 AND 30000 BEFORE FIRST ENTRY

IX = MOD(171 * IX, 30269)

IY = MOD(172 * IY, 30307)

IZ = MOD(170 * IZ, 30323)

RANDOM = AMOD(FLOAT(IX) / 30269.0 + FLOAT(IY) / 30307.0 + FLOAT(IZ) / 30323.0, 1.0)

It is an iterative function which uses the result of the previous call to generate a new number. Referenced as the Wichman-Hill procedure, it guarantees that more than 10^13 numbers will be generated before the sequence repeats itself.


The problem with Rnd

For the algorithm to work, it first needs to be initialised with values for IX, IY & IZ. The problem we have here is that we can't initialise the algorithm with random variables, as it is this algorithm we need in order to get random values, so the only option is to provide some static values to get it going.

I have tested this and it seems to be the case. Opening a fresh instance of Excel, ? Rnd() returns 0.70554. Doing the same again returns the exact same number.

So the problem we have is Rnd without using Randomize gives us a much longer sequence of random numbers, however that sequence will start at the same place each time we open Excel. Where functions are dependant on random generation, such as password generation, this doesn't suffice as we will get the same repeated results each time we open Excel.


The solution

Here's a function I have come up with and it seems to work well:

Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal Milliseconds As LongPtr)
Public Declare Function GetTickCount Lib "kernel32" () As Long
Public randomCount As Long
Function getRandom()
    If randomCount Mod 255 = 0 Then
        Sleep 1
    End If
    Randomize GetTickCount
    getRandom = Rnd()
    randomCount = randomCount + 1
End Function

It makes use of the GetTickCount function as the Randomize seed. Each call adds 1 to a randomCount variable, and after every 255 runs the macro is forced to sleep for 1 millisecond (although this actually works out at around 15 on my system) so that the seed of GetTickCount will be changed, and so a new sequence of numbers will be returned by Rnd

This of course will return the same sequence if by chance it is used at the same system time, however for most cases it will be a sufficient method for generating more random numbers. If not, it would need some fancy work using something like the Random.Org API.



标签: vba random