I'm trying to understand why the following 2 code snippets lead to different output and I'm hoping someone can enlighten me. The second version is marginally faster - which is my motivation since this all sits in a loop, but I need to know if it is equivalent and if not then why ?
things <- 100000
n.events <- 3
probs <- rep(0.05, n.events)
# Method 1
set.seed(1)
events1 <- matrix(n.events, nrow=things)
events1 <- sapply(probs, function(probs) rbinom(things,1,probs))
# Method 2
set.seed(1)
events2 <- matrix( rbinom(things*n.events, 1, probs), ncol=n.events, byrow=TRUE)
> identical(events1, events2)
[1] FALSE
To have the same result, you need to put
byrow=FALSE
instead ofbyrow=TRUE
for the second option.the first way you're computing the random matrix, it computes
rbinom(things,1,probs)
length(probs)
times, obtaining each time athings
long vector.To obtain the exact same matrix with the second way, which computes a
length(probs)
*things
long vector, you need to "organize" it inlength(probs)
vectors of lengththings
.(either way, seed is "treated" exactly the same "in between" the computations so the numbers obtained are the same)
EDIT
If you want to use different values for the vector
probs
, you need to computeevents2
in a way that the probs are used in the same "order" than forevents1
so first the first value of vector probs, until the desired length is reached, then the second value, etc. and then, organize the columns of your matrix the same way as before, so usingbyrow=FALSE
.Example