So let's say I want to take the vector X = 2*1:N and raise e to the exponent of each element. (Yes, I recognize the best way to do that is simply by vectorization exp(X), but the point of this is to compare for loop with sapply). Well I tested by incrementally trying three methods (one with for loops, two with sapply applied in a different manner) with different sample sizes and measuring the corresponding time. I then plot the sample size N vs time t for each method.
Each method is indicated by "#####".
k <- 20
t1 <- rep(0,k)
t2 <- rep(0,k)
t3 <- rep(0,k)
L <- round(10^seq(4,7,length=k))
for (i in 1:k) {
X <- 2*1:L[i]
Y1 <- rep(0,L[i])
t <- system.time(for (j in 1:L[i]) Y1[j] <- exp(X[j]))[3] #####
t1[i] <- t
}
for (i in 1:k) {
X <- 2*1:L[i]
t <- system.time( Y2 <- sapply(1:L[i], function(q) exp(X[q])) )[3] #####
t2[i] <- t
}
for (i in 1:k) {
X <- 2*1:L[i]
t <- system.time( Y3 <- sapply(X, function(x) exp(x)) )[3] #####
t3[i] <- t
}
plot(L, t3, type='l', col='green')
lines(L, t2,col='red')
lines(L, t1,col='blue')
plot(log(L), log(t1), type='l', col='blue')
lines(log(L), log(t2),col='red')
lines(log(L), log(t3), col='green')
We get the following results. Plot of N vs t:
Plot of log(N) vs log(t)
The blue plot is the for loop method, and the red and green plots are the sapply methods. In the regular plot, you can see that, as sample size gets larger, the for loop method is heavily favoured over the sapply methods, which is not what I would have expected at all. If you look at the log-log plot (in order to more easily distinguish the smaller N results) we see the expected result of sapply being more efficient than for loop for small N.
Does anybody know why sapply scales more slowly than for loop with sample size? Thanks.
Try taking out the excess function(x) code that runs every iteration. It must have a lot of overhead. I didn't separate the two, but the for loop should also include all associated work for an apples to apples comparison like this:
A much faster sapply:
It's still slower, but much closer than the first two sapply's.
Most of the points have been made before, but...
sapply()
useslapply()
and then pays a one-time cost of formatting the result usingsimplify2array()
.lapply()
creates a long vector, and then a large number of short (length 1) vectors, whereas the for loop generates a single long vector.The
sapply()
as written has an extra function call compared to the for loop.Using
gcinfo(TRUE)
lets us see the garbage collector in action, and each approach results in the garbage collector running several times -- this can be quite expensive, and not completely deterministic.Points 1 - 3 need to be interpreted in the artificial context of the example --
exp()
is a fast function, exaggerating the relative contribution of memory allocation (2), function evaluation (3), and one-time costs (1). Point 4 emphasizes the need to replicate timings in a systematic way.I started by loading the compiler and microbenchmark packages. I focused on the largest size only
In my first experiment I replaced
exp()
with simple assignment, and tried different ways of representing the result in the for loop -- a vector of numeric values, or list of numeric vectors as implied bylapply()
.So it pays to compile the for loop, and there's a fairly substantial cost to generating a list of vectors. Again this memory cost is amplified by the simplicity of the body of the for loop.
My next experiment explored different
*apply()
There is a large cost to the simplification step in
sapply()
;vapply()
is more robust thanlapply()
(I am guaranteed the type of the return) without performance penalty, so it should be my go-to function in this family.Finally, I compared the for iteration to
vapply()
where the result is a list-of-vectors.The compiled for loop and
vapply()
are neck-in-neck; the extra function call almost doubles the execution time ofvapply()
(again, this effect is exaggerated by the simplicity of the example). There does not seem to be much change in relative speed across a range of sizesYou're not accounting for the time it takes to allocate space for the resulting vector
Y1
. As the sample size increases, the time it takes to allocateY1
becomes a larger share of the execution time, and the time it takes to do the replacement becomes a smaller share.sapply
always allocates memory for the the result, so that's one reason it would be less efficient as sample size increases. gagolews also has a very good point aboutsapply
callingsimplify2array
. That (likely) adds another copy.After some more testing, it looks like
lapply
is still about the same or slower than a byte-compiled function containing a for loop, as the objects get larger. I'm not sure how to explain this, other than possibly this line indo_lapply
:Or possibly something with how
lapply
constructs the function call... but I'm mostly speculating.Here's the code I used to test: