I want to create a function that generates a random number based on its input and apply it to a boolean vector. This function will be used to generate test data with approx 500M elements.
f <- function(x, p) ifelse(x, runif(1)^p, runif(1)^(1/p))
f(c(T,T,T,F,F,F), 2)
What I get is not what I wanted.
[1] 0.0054 0.0054 0.0054 0.8278 0.8278 0.8278
I'd expect a new random number for every element of my input vector, not two random numbers repeated. Why do I get this result, and how can I get the desired result which would be the same as
c(runif(3)^2, runif(3)^(1/2))
which yields a new random number for every element
0.0774 0.7071 0.2184 0.8719 0.9990 0.8819
You would need to make two different vectors of random numbers of the same length as the x
-vector.
f <- function(x, p) ifelse(x, runif(6)^p, runif(6)^(1/p))
f(c(T,T,T,F,F,F), 2)
[1] 0.3040201 0.5543376 0.7291466 0.5205014 0.3563542 0.8697398
Or more generally:
f <- function(x, p) ifelse(x, runif( length(x) )^p, runif( length(x) )^(1/p))
The ifelse
-function is not really doing looping. The second and third arguments get evaluated once each.
@BondedDust's answer is correct (i.e., ifelse()
doesn't really loop) but slightly inefficient -- it samples twice as many random uniform deviates as necessary (in practice it wouldn't matter much unless you were using a huge vector or running the function huge numbers of times). Here's a slightly more efficient version which vectorizes over the power (^
) operator instead:
set.seed(1001)
f <- function(x, p=2) {
rvec <- runif(length(x))
rvec^(ifelse(x, p, 1/p))
}
## best to avoid the T/F shortcut ...
test <- c(TRUE,TRUE,TRUE,FALSE,FALSE,FALSE)
f(test, 2)
@Frank points out in comments that runif(length(x))^(p^(2*x-1))
is even better, although it's a little too clever for my taste.
fortunes::fortune("7ms")
... Brian Ripley: 'so slow' sic: what are you going to do in the 7ms you saved?
f_bb <- f
f_bd <- function(x, p=2)
ifelse(x, runif( length(x) )^p, runif( length(x) )^(1/p))
f_frank <- function(x,p=2) runif(length(x))^(p^(2*x-1))
library("rbenchmark")
benchmark(f_bb(test),f_bd(test),f_frank(test),replications=10000,
columns=c("test","replications","elapsed","relative"))
## test replications elapsed relative
## 1 f_bb(test) 10000 0.161 2.516
## 2 f_bd(test) 10000 0.199 3.109
## 3 f_frank(test) 10000 0.064 1.000