When I want to generate a random number with runif()
within a specific interval under exclusion of a particular value (e.g. 0.5) I can write this function ex.runif()
who does the job, but it is hundreds of times slower than the normal runif()
. Could anyone point me to a better solution?
ex.runif <- function(n, excl, min, max) {
# ex.runif() excludes the specific value 'excl'
q <- excl
while (q == excl) {
q <- runif(n, min = min, max = max)
}
return(q)
}
set.seed(42)
ex.runif(1, .5, .25, .75) # exclude .5, interval [.25, .75]
# [1] 0.707403
library(microbenchmark)
microbenchmark(ex.runif(1, .5, .25, .75), runif(1, min = .25, max = .75))
# Unit: microseconds
# expr min lq mean median uq max neval cld
# ex.runif 692.439 704.685 721.51135 715.2735 722.9275 962.373 100 b
# runif 2.041 2.551 3.49044 2.8070 3.3170 21.176 100 a
If the set of values that you want to exclude is finite, then, in most cases, there is no need for a function like that. The reason is that the uniform distribution is continuous and any finite number of values are taken with probability zero. That is,
q == excl
is, in terms of probability theory, true with probability zero.For instance,
The same is most likely going to happen under any other seed as well. Thus, you may just keep using
runif
.@duckmayr makes a good point about numeric precision. In fact, as the interval
[min, max]
is getting narrower,q == excl
becomes true with increasingly high probability and, in some applications, it may even become relevant.However, if in theory you indeed need to exclude only a single value
0.5
, then performing a check likeq == excl
might even do harm by excluding unnecessary draws.For instance, in my case
.Machine$double.eps
is 2.220446e-16. Then the probability of getting a draw from[0.5 - .Machine$double.eps / 4, 0.5 + .Machine$double.eps / 4]
when[min,max]
is[0.5 - 10^(-k), 0.5 + 10^(-k)]
and making a false conclusion is 2 * (2.220446e-16 / 4) / (2 * 10^(-k)) or around 0.55 * 10^(k-16).