Questions about set.seed() in R

2019-01-23 13:27发布

I understand what set.seed() does and when I might use it, but I still have many questions about the function. Here are a few:

  1. Is it possible to "reset" set.seed() to something "more random" if you have called set.seed() earlier in your session? Is that even necessary?
  2. Is it possible to view the seed that R is currently using?
  3. Is there a way to make set.seed() allow alphanumeric seeds, the way one can enter them at random.org (be sure you are in the advanced mode, and see "Part 3" of the form to see what I mean)?

标签: r random
5条回答
啃猪蹄的小仙女
2楼-- · 2019-01-23 13:34

I have the same issue as in question 1. I then figure I can simply reset seed in the loop by:

set.seed(123)
x<- rnorm(10,1,1)
set.seed(null)

This way at the end of each loop the seed just got deleted. It worked for me.

查看更多
够拽才男人
3楼-- · 2019-01-23 13:35

For your question 3 there is the char2seed function in the TeachingDemos package which will take a character string (alhpa numeric) and convert it to an integer and by default use that to set a new seed. The idea was that students could use their name (or some combination/subset of names) as a seed so each student gets a different dataset, but the teacher can reproduce each student's dataset.

查看更多
狗以群分
4楼-- · 2019-01-23 13:39

For an answer to 2, first see the help page ?RNGkind.

To find the kind of RNG in use:

RNGkind()
# [1] "Mersenne-Twister" "Inversion" 

The Mersenne Twister is the default.

From the help page:

‘"Mersenne-Twister":’ From Matsumoto and Nishimura (1998). A twisted GFSR with period 2^19937 - 1 and equidistribution in 623 consecutive dimensions (over the whole period). The ‘seed’ is a 624-dimensional set of 32-bit integers plus a current position in that set.

To find the current seed in use, you need to first call the random number generator.

runif(1, 0, 1)                                                                                                                                                  
# [1] 0.9834062                                                                                                                                                      
.Random.seed
# [Gives a 626 length vector]

Calling set.seed(some_integer) followed by .Random.seed, will always give the same 626 length vector if you use the same some_integer. To put it differently, the 626-length vector is determined solely by some_integer, given one is using the Mersenne Twister, of course.

Also, of course, running set.seed to some fixed value will give you the same values for calls to random number routines following it. That's the main use for it in practice, to give reproducibility. E.g.

set.seed(1)
runif(5, 0, 1)
# [1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819
rnorm(1, 0, 1)
# [1] 1.272429
set.seed(1)
runif(5, 0, 1)
# [1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819
rnorm(1, 0, 1)
# [1] 1.272429

All the basic number generator code in R is in the file src/main/RNG.c in the source code.

It is in C, but fairly easy to follow.

查看更多
贼婆χ
5楼-- · 2019-01-23 13:50

Just for fun:

set.seed.alpha <- function(x) {
  require("digest")
  hexval <- paste0("0x",digest(x,"crc32"))
  intval <- type.convert(hexval) %% .Machine$integer.max
  set.seed(intval)
}

So you can do:

set.seed.alpha("hello world")

(in fact x can be any R object, not just an alphanumeric string)

查看更多
ゆ 、 Hurt°
6楼-- · 2019-01-23 13:56

It's possible, if you set the seed to something like the final digits of your time epoch, but it's really not necessary. The intended use of PRNGs is that you set the seed once at the start of a session, and use successive generated variates from this. Do things differently, and you don't get to enjoy the various good theoretical and empirical properties the R RNGs have.

But I'm not sure you really understand the purpose of set.seed. It's not really there for you to get 'more random' numbers. If you are doing some kind of application for which the R PRNG is insufficient (for instance, if you require cryptographic randomness), you might as well generate all your random numbers by some alternate method and use them directly. The real purpose of set.seed is to produce reproducibility in results using RNGs. If you start the same analysis using the same sequence of random number generations, and set the seed to the same value, you will always get the same result. This is helpful in debugging, and for others reviewing your results.

To use the epoch time, do something like

t <- as.numeric(Sys.time())
seed <- 1e8 * (t - floor(t))
set.seed(seed); print(seed)
查看更多
登录 后发表回答