How to exclude specific result of runif()?

2019-03-01 00:46发布

When I want to generate a random number with runif() within a specific interval under exclusion of a particular value (e.g. 0.5) I can write this function ex.runif() who does the job, but it is hundreds of times slower than the normal runif(). Could anyone point me to a better solution?

ex.runif <- function(n, excl, min, max) {
  # ex.runif() excludes the specific value 'excl'
  q <- excl
  while (q == excl) {
    q <- runif(n, min = min, max = max)
  } 
  return(q)
}

set.seed(42)
ex.runif(1, .5, .25, .75)  # exclude .5, interval [.25, .75]
# [1] 0.707403

library(microbenchmark)
microbenchmark(ex.runif(1, .5, .25, .75), runif(1, min = .25, max = .75))
# Unit: microseconds
# expr         min      lq      mean   median       uq     max neval cld
# ex.runif 692.439 704.685 721.51135 715.2735 722.9275 962.373   100   b
# runif      2.041   2.551   3.49044   2.8070   3.3170  21.176   100  a 

1条回答
叛逆
2楼-- · 2019-03-01 00:55

If the set of values that you want to exclude is finite, then, in most cases, there is no need for a function like that. The reason is that the uniform distribution is continuous and any finite number of values are taken with probability zero. That is, q == excl is, in terms of probability theory, true with probability zero.

For instance,

set.seed(42)
ex.runif(5, .5, .25, .75)
# [1] 0.7074030 0.7185377 0.3930698 0.6652238 0.5708728
set.seed(42)
runif(5, 0.25, 0.75)
# [1] 0.7074030 0.7185377 0.3930698 0.6652238 0.5708728

The same is most likely going to happen under any other seed as well. Thus, you may just keep using runif.

@duckmayr makes a good point about numeric precision. In fact, as the interval [min, max] is getting narrower, q == excl becomes true with increasingly high probability and, in some applications, it may even become relevant.

However, if in theory you indeed need to exclude only a single value 0.5, then performing a check like q == excl might even do harm by excluding unnecessary draws.

For instance, in my case .Machine$double.eps is 2.220446e-16. Then the probability of getting a draw from [0.5 - .Machine$double.eps / 4, 0.5 + .Machine$double.eps / 4] when [min,max] is [0.5 - 10^(-k), 0.5 + 10^(-k)] and making a false conclusion is 2 * (2.220446e-16 / 4) / (2 * 10^(-k)) or around 0.55 * 10^(k-16).

查看更多
登录 后发表回答