My understanding is that using set.seed
ensures reproducibility but this is not the case with the following R code in R 2.15.2
. Am I missing something here?
set.seed(12345)
rnorm(5)
[1] 0.5855288 0.7094660 -0.1093033 -0.4534972 0.6058875
rnorm(5)
[1] -1.8179560 0.6300986 -0.2761841 -0.2841597 -0.9193220
set.seed()
reinitializes the random number generator.
set.seed(12345)
rnorm(5)
[1] 0.5855288 0.7094660 -0.1093033 -0.4534972 0.6058875
set.seed(12345)
rnorm(5)
[1] 0.5855288 0.7094660 -0.1093033 -0.4534972 0.6058875
set.seed(12345)
rnorm(5)
[1] 0.5855288 0.7094660 -0.1093033 -0.4534972 0.6058875
Any call that uses the random number generator will change the current seed, even if you've manually set it with set.seed
.
set.seed(1)
x <- .Random.seed # get the current seed
runif(10) # uses random number generator, so changes current seed
y <- .Random.seed
identical(x, y) # FALSE
As @StephanKolassa demonstrates, you'd have to reset the seed before each use of the random number generator to guarantee that it uses the same one each time.
It's worth underlining here that the sequence of numbers is still reproducible each time you set the seed, because of this reinitialisation.
So although with each subsequent call to e.g. rnorm
you're getting different answers to each call, you're still going to get the same sequence of numbers from the point the seed was set.
E.g., per the original question:
set.seed(12345)
rnorm(5)
[1] 0.5855288 0.7094660 -0.1093033 -0.4534972 0.6058875
rnorm(5)
[1] -1.8179560 0.6300986 -0.2761841 -0.2841597 -0.9193220
Produces the same sequence of 10 numbers as:
set.seed(12345)
rnorm(10)
[1] 0.5855288 0.7094660 -0.1093033 -0.4534972 0.6058875
-1.8179560 0.6300986 -0.2761841 -0.2841597 -0.9193220
Or
set.seed(12345)
rnorm(7)
[1] 0.5855288 0.7094660 -0.1093033 -0.4534972 0.6058875
-1.8179560 0.6300986
rnorm(3)
[1] -0.2761841 -0.2841597 -0.9193220
Or whatever series of calls to rnorm
.
The point here is that if you set the seed once at the start of a script you will get the same set of random numbers generated each time you run that whole script, while getting a different set of numbers from each random number generator call within the code. This is because you are running on the same sequence from that seed at the start. This can be a good thing, and it means that if you want a reproducible script you can set the seed once at the beginning.