So let's say I want to take the vector X = 2*1:N and raise e to the exponent of each element. (Yes, I recognize the best way to do that is simply by vectorization exp(X), but the point of this is to compare for loop with sapply). Well I tested by incrementally trying three methods (one with for loops, two with sapply applied in a different manner) with different sample sizes and measuring the corresponding time. I then plot the sample size N vs time t for each method.
Each method is indicated by "#####".
k <- 20
t1 <- rep(0,k)
t2 <- rep(0,k)
t3 <- rep(0,k)
L <- round(10^seq(4,7,length=k))
for (i in 1:k) {
X <- 2*1:L[i]
Y1 <- rep(0,L[i])
t <- system.time(for (j in 1:L[i]) Y1[j] <- exp(X[j]))[3] #####
t1[i] <- t
}
for (i in 1:k) {
X <- 2*1:L[i]
t <- system.time( Y2 <- sapply(1:L[i], function(q) exp(X[q])) )[3] #####
t2[i] <- t
}
for (i in 1:k) {
X <- 2*1:L[i]
t <- system.time( Y3 <- sapply(X, function(x) exp(x)) )[3] #####
t3[i] <- t
}
plot(L, t3, type='l', col='green')
lines(L, t2,col='red')
lines(L, t1,col='blue')
plot(log(L), log(t1), type='l', col='blue')
lines(log(L), log(t2),col='red')
lines(log(L), log(t3), col='green')
We get the following results.
Plot of N vs t:
Plot of log(N) vs log(t)
The blue plot is the for loop method, and the red and green plots are the sapply methods. In the regular plot, you can see that, as sample size gets larger, the for loop method is heavily favoured over the sapply methods, which is not what I would have expected at all. If you look at the log-log plot (in order to more easily distinguish the smaller N results) we see the expected result of sapply being more efficient than for loop for small N.
Does anybody know why sapply scales more slowly than for loop with sample size? Thanks.
You're not accounting for the time it takes to allocate space for the resulting vector Y1
. As the sample size increases, the time it takes to allocate Y1
becomes a larger share of the execution time, and the time it takes to do the replacement becomes a smaller share.
sapply
always allocates memory for the the result, so that's one reason it would be less efficient as sample size increases. gagolews also has a very good point about sapply
calling simplify2array
. That (likely) adds another copy.
After some more testing, it looks like lapply
is still about the same or slower than a byte-compiled function containing a for loop, as the objects get larger. I'm not sure how to explain this, other than possibly this line in do_lapply
:
if (MAYBE_REFERENCED(tmp)) tmp = lazy_duplicate(tmp);
Or possibly something with how lapply
constructs the function call... but I'm mostly speculating.
Here's the code I used to test:
k <- 20
t1 <- rep(0,k)
t2 <- rep(0,k)
t3 <- rep(0,k)
L <- round(10^seq(4,7,length=k))
L <- round(10^seq(4,6,length=k))
# put the loop in a function
fun <- function(X, L) {
Y1 <- rep(0,L)
for (j in 1:L)
Y1[j] <- exp(X[j])
Y1
}
# for loops often benefit from compiling
library(compiler)
cfun <- cmpfun(fun)
for (i in 1:k) {
X <- 2*1:L[i]
t1[i] <- system.time( Y1 <- fun(X, L[i]) )[3]
}
for (i in 1:k) {
X <- 2*1:L[i]
t2[i] <- system.time( Y2 <- cfun(X, L[i]) )[3]
}
for (i in 1:k) {
X <- 2*1:L[i]
t3[i] <- system.time( Y3 <- lapply(X, exp) )[3]
}
identical(Y1, Y2) # TRUE
identical(Y1, unlist(Y3)) # TRUE
plot(L, t1, type='l', col='blue', log="xy", ylim=range(t1,t2,t3))
lines(L, t2, col='red')
lines(L, t3, col='green')
Most of the points have been made before, but...
sapply()
uses lapply()
and then pays a one-time cost of formatting the result using simplify2array()
.
lapply()
creates a long vector, and then a large number of short (length 1) vectors, whereas the for loop generates a single long vector.
The sapply()
as written has an extra function call compared to the for loop.
Using gcinfo(TRUE)
lets us see the garbage collector in action, and each approach results in the garbage collector running several times -- this can be quite expensive, and not completely deterministic.
Points 1 - 3 need to be interpreted in the artificial context of the example -- exp()
is a fast function, exaggerating the relative contribution of memory allocation (2), function evaluation (3), and one-time costs (1). Point 4 emphasizes the need to replicate timings in a systematic way.
I started by loading the compiler and microbenchmark packages. I focused on the largest size only
library(compiler)
library(microbenchmark)
n <- 10^7
In my first experiment I replaced exp()
with simple assignment, and tried different ways of representing the result in the for loop -- a vector of numeric values, or list of numeric vectors as implied by lapply()
.
fun0n <- function(n) {
Y1 <- numeric(n)
for (j in seq_len(n)) Y1[j] <- 1
}
fun0nc <- compiler::cmpfun(fun0n)
fun0l <- function(n) {
Y1 <- vector("list", n)
for (j in seq_len(n)) Y1[[j]] <- 1
}
fun0lc <- compiler::cmpfun(fun0l)
microbenchmark(fun0n(n), fun0nc(n), fun0lc(n), times=5)
## Unit: seconds
## expr min lq mean median uq max neval
## fun0n(n) 5.620521 6.350068 6.487850 6.366029 6.933915 7.168717 5
## fun0nc(n) 1.852048 1.974962 2.028174 1.984000 2.035380 2.294481 5
## fun0lc(n) 1.644120 2.706605 2.743017 2.998258 3.178751 3.187349 5
So it pays to compile the for loop, and there's a fairly substantial cost to generating a list of vectors. Again this memory cost is amplified by the simplicity of the body of the for loop.
My next experiment explored different *apply()
fun2s <- function(n)
sapply(raw(n), function(i) 1)
fun2l <- function(n)
lapply(raw(n), function(i) 1)
fun2v <- function(n)
vapply(raw(n), function(i) 1, numeric(1))
microbenchmark(fun2s(n), fun2l(n), fun2v(n), times=5)
## Unit: seconds
## expr min lq mean median uq max neval
## fun2s(n) 4.847188 4.946076 5.625657 5.863453 6.130287 6.341282 5
## fun2l(n) 1.718875 1.912467 2.024325 2.141173 2.142004 2.207105 5
## fun2v(n) 1.722470 1.829779 1.847945 1.836187 1.845979 2.005312 5
There is a large cost to the simplification step in sapply()
; vapply()
is more robust than lapply()
(I am guaranteed the type of the return) without performance penalty, so it should be my go-to function in this family.
Finally, I compared the for iteration to vapply()
where the result is a list-of-vectors.
fun1 <- function(n) {
Y1 <- vector("list", n)
for (j in seq_len(n)) Y1[[j]] <- exp(0)
}
fun1c <- compiler::cmpfun(fun1)
fun3 <- function(n)
vapply(numeric(n), exp, numeric(1))
fun3fun <- function(n)
vapply(numeric(n), function(i) exp(i), numeric(1))
microbenchmark(fun1c(n), fun3(n), fun3fun(n), times=5)
## Unit: seconds
## expr min lq mean median uq max neval
## fun1c(n) 2.265282 2.391373 2.610186 2.438147 2.450145 3.505986 5
## fun3(n) 2.303728 2.324519 2.646558 2.380424 2.384169 3.839950 5
## fun3fun(n) 4.782477 4.832025 5.165543 4.893481 4.973234 6.346498 5
microbenchmark(fun1c(10^3), fun1c(10^4), fun1c(10^5),
fun3(10^3), fun3(10^4), fun3(10^5),
times=50)
## Unit: microseconds
## expr min lq mean median uq max neval
## fun1c(10^3) 199 215 230 228 241 279 50
## fun1c(10^4) 1956 2016 2226 2296 2342 2693 50
## fun1c(10^5) 19565 20262 21671 20938 23410 24116 50
## fun3(10^3) 227 244 254 254 264 295 50
## fun3(10^4) 2165 2256 2359 2348 2444 2695 50
## fun3(10^5) 22069 22796 23503 23251 24393 25735 50
The compiled for loop and vapply()
are neck-in-neck; the extra function call almost doubles the execution time of vapply()
(again, this effect is exaggerated by the simplicity of the example). There does not seem to be much change in relative speed across a range of sizes
Try taking out the excess function(x) code that runs every iteration. It must have a lot of overhead. I didn't separate the two, but the for loop should also include all associated work for an apples to apples comparison like this:
t <- system.time(Y1 <- rep(0,L[i])) + system.time(for (j in 1:L[i]) Y1[j] <- exp(X[j]))[3] #####
A much faster sapply:
for (i in 1:k) {
X <- 2*1:L[i]
t <- system.time( Y4 <- sapply(X,exp )[3]) #####
t4[i] <- t
}
It's still slower, but much closer than the first two sapply's.