In response to this question I ran the following VBA experiment:
Sub Test()
Dim i As Long, A As Variant
Dim count1 As Long, count2 As Long
ReDim A(1 To 10000)
For i = 1 To 10000
Randomize
A(i) = IIf(Rnd() < 0.5, 0, 1)
Next i
'count how often A(i) = A(i+1)
For i = 1 To 9999
If A(i) = A(i + 1) Then count1 = count1 + 1
Next i
For i = 1 To 10000
A(i) = IIf(Rnd() < 0.5, 0, 1)
Next i
'count how often A(i) = A(i+1)
For i = 1 To 9999
If A(i) = A(i + 1) Then count2 = count2 + 1
Next i
Debug.Print "First Loop: " & count1
Debug.Print "Second Loop: " & count2 & vbCrLf
End Sub
When I saw output like this:
First Loop: 5550
Second Loop: 4976
I was pretty sure that I knew what was happening: VBA was converting the system clock into something of lower resolution (perhaps microsecond) which as a consequence would lead to Randomize
sometimes producing identical seeds in two or more passes through the loop. In my original answer I even confidently asserted this. But then I ran the code some more and noticed that the output was sometimes like this:
First Loop: 4449
Second Loop: 5042
The overseeding is still causing a noticeable autocorrelation -- but in the opposite (and unexpected) direction. Successive passes through the loop with the same seed should produce identical outputs, hence we should see successive values agreeing more often than chance would predict, not disagreeing more often than chance would predict.
Curious now, I modified the code to:
Sub Test2()
Dim i As Long, A As Variant
Dim count1 As Long, count2 As Long
ReDim A(1 To 10000)
For i = 1 To 10000
Randomize
A(i) = Rnd()
Next i
'count how often A(i) = A(i+1)
For i = 1 To 9999
If A(i) = A(i + 1) Then count1 = count1 + 1
Next i
For i = 1 To 10000
A(i) = Rnd()
Next i
'count how often A(i) = A(i+1)
For i = 1 To 9999
If A(i) = A(i + 1) Then count2 = count2 + 1
Next i
Debug.Print "First Loop: " & count1
Debug.Print "Second Loop: " & count2 & vbCrLf
End Sub
Which always gives the following output:
First Loop: 0
Second Loop: 0
It seems that it isn't the case that successive calls to Randomize
sometimes returns the same seed (at least not often enough to make a difference).
But if that isn't the source of the autocorrelation -- what is? And -- why does it sometimes manifest itself as a negative rather than a positive autocorrelation?
Partial answer only, fell free to edit and complete.
Well, there is clearly a correlation when you overuse the Randomize function.
I tried the following code, with a conditional formatting (black fill for values >0.5), and there is clearly patterns appearing (try to comment the Randomize to see a more "random" pattern. (best seen with 20 pt columns and 10% zoom)
Function Rndmap()
Dim i As Long, j As Long
Dim bmp(1 To 512, 1 To 512) As Long
For i = 1 To 512
For j = 1 To 512
' Rnd -1 ' uncomment this line to get a big white and black lines pattern.
Randomize 'comment this line to have a random pattern
bmp(i, j) = IIf(Rnd() < 0.5, 0, 1)
Next j
Next i
Range(Cells(1, 1), Cells(512, 512)) = bmp
End Function
So while the MSDN states that "Using Randomize with the same value for number does not repeat the previous sequence.", implying that if the Timer returns twice the same value, the Rnd should keep on the same random sequence without reseting, there is still some behind the scene link..
Some screenshots:
Rnd() only:
Using Randomize:
Using Rnd -1 and Randomize:
The Randomize
method initialises the Rnd
function with the current system time as it's seed, you can also specify a number with Randomize
to be used as the seed.
I decided to test how long a sequence continues before repeating itself:
Sub randomRepeatTest()
For i = 1 To 100000
Randomize
randomThread = randomThread & Int(9 * Rnd + 1)
If i Mod 2 = 0 Then
If Left(randomThread, i / 2) = Right(randomThread, i / 2) Then
Debug.Print i / 2
Exit Sub
End If
End If
Next i
End Sub
This sub generates a random sequence of the digits 0 - 9, and as the sequence becomes an even length it is tested to see if the first half of the sequence matches the second half, and if so it outputs the length the sequence reached before repeating. After running it a number of times, and discounting where a digit is repeated twice at the beginning, the result comes out at 256 (nice).
Providing any value to Randomize
will still return a result of 256.
We're randomizing Rnd
every loop, so what's going on here?
Well as I said at the beginning, if no value is given to Randomize
, it will use the system time as the seed. The resolution of this time is something I can't seem to find sourced, however I believe it to be low.
I have tested using the value of timer
which returns the time of day in seconds to 2 decimal places (e.g. 60287.81). I have also tried GetTickCount
which returns the system active time (starts counting at boot) in milliseconds. Both of these still result in the 256 sequence limit.
So, why when we're randomizing every loop does the sequence repeat? Well the reality is, the code is executed within a millisecond. Essentially, we're providing the same number to randomize every loop, and so we're not actually shuffling the seed.
So, is Rnd
more random without Randomize
?
I ran the above sub again without Randomize
; nothing returned. I upped the loop count to 2,000,000; still nothing.
I've managed to source the algorithm used by the workbook Rand
formula, which I believe is the same as Rnd
with no initialised seed:
C IX, IY, IZ SHOULD BE SET TO INTEGER VALUES BETWEEN 1 AND 30000 BEFORE FIRST ENTRY
IX = MOD(171 * IX, 30269)
IY = MOD(172 * IY, 30307)
IZ = MOD(170 * IZ, 30323)
RANDOM = AMOD(FLOAT(IX) / 30269.0 + FLOAT(IY) / 30307.0 + FLOAT(IZ) / 30323.0, 1.0)
It is an iterative function which uses the result of the previous call to generate a new number. Referenced as the Wichman-Hill procedure, it guarantees that more than 10^13 numbers will be generated before the sequence repeats itself.
The problem with Rnd
For the algorithm to work, it first needs to be initialised with values for IX
, IY
& IZ
. The problem we have here is that we can't initialise the algorithm with random variables, as it is this algorithm we need in order to get random values, so the only option is to provide some static values to get it going.
I have tested this and it seems to be the case. Opening a fresh instance of Excel, ? Rnd()
returns 0.70554. Doing the same again returns the exact same number.
So the problem we have is Rnd
without using Randomize
gives us a much longer sequence of random numbers, however that sequence will start at the same place each time we open Excel. Where functions are dependant on random generation, such as password generation, this doesn't suffice as we will get the same repeated results each time we open Excel.
The solution
Here's a function I have come up with and it seems to work well:
Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal Milliseconds As LongPtr)
Public Declare Function GetTickCount Lib "kernel32" () As Long
Public randomCount As Long
Function getRandom()
If randomCount Mod 255 = 0 Then
Sleep 1
End If
Randomize GetTickCount
getRandom = Rnd()
randomCount = randomCount + 1
End Function
It makes use of the GetTickCount
function as the Randomize
seed. Each call adds 1 to a randomCount
variable, and after every 255 runs the macro is forced to sleep for 1 millisecond (although this actually works out at around 15 on my system) so that the seed of GetTickCount
will be changed, and so a new sequence of numbers will be returned by Rnd
This of course will return the same sequence if by chance it is used at the same system time, however for most cases it will be a sufficient method for generating more random numbers. If not, it would need some fancy work using something like the Random.Org API.