When I want to generate a random number with runif()
within a specific interval under exclusion of a particular value (e.g. 0.5) I can write this function ex.runif()
who does the job, but it is hundreds of times slower than the normal runif()
. Could anyone point me to a better solution?
ex.runif <- function(n, excl, min, max) {
# ex.runif() excludes the specific value 'excl'
q <- excl
while (q == excl) {
q <- runif(n, min = min, max = max)
}
return(q)
}
set.seed(42)
ex.runif(1, .5, .25, .75) # exclude .5, interval [.25, .75]
# [1] 0.707403
library(microbenchmark)
microbenchmark(ex.runif(1, .5, .25, .75), runif(1, min = .25, max = .75))
# Unit: microseconds
# expr min lq mean median uq max neval cld
# ex.runif 692.439 704.685 721.51135 715.2735 722.9275 962.373 100 b
# runif 2.041 2.551 3.49044 2.8070 3.3170 21.176 100 a
If the set of values that you want to exclude is finite, then, in most cases, there is no need for a function like that. The reason is that the uniform distribution is continuous and any finite number of values are taken with probability zero. That is, q == excl
is, in terms of probability theory, true with probability zero.
For instance,
set.seed(42)
ex.runif(5, .5, .25, .75)
# [1] 0.7074030 0.7185377 0.3930698 0.6652238 0.5708728
set.seed(42)
runif(5, 0.25, 0.75)
# [1] 0.7074030 0.7185377 0.3930698 0.6652238 0.5708728
The same is most likely going to happen under any other seed as well. Thus, you may just keep using runif
.
@duckmayr makes a good point about numeric precision. In fact, as the interval [min, max]
is getting narrower, q == excl
becomes true with increasingly high probability and, in some applications, it may even become relevant.
However, if in theory you indeed need to exclude only a single value 0.5
, then performing a check like q == excl
might even do harm by excluding unnecessary draws.
For instance, in my case .Machine$double.eps
is 2.220446e-16. Then the probability of getting a draw from [0.5 - .Machine$double.eps / 4, 0.5 + .Machine$double.eps / 4]
when [min,max]
is [0.5 - 10^(-k), 0.5 + 10^(-k)]
and making a false conclusion is 2 * (2.220446e-16 / 4) / (2 * 10^(-k)) or around 0.55 * 10^(k-16).