Why does sapply scale slower than for loop with sa

2019-06-20 11:17发布

So let's say I want to take the vector X = 2*1:N and raise e to the exponent of each element. (Yes, I recognize the best way to do that is simply by vectorization exp(X), but the point of this is to compare for loop with sapply). Well I tested by incrementally trying three methods (one with for loops, two with sapply applied in a different manner) with different sample sizes and measuring the corresponding time. I then plot the sample size N vs time t for each method.

Each method is indicated by "#####".

k <- 20 
t1 <- rep(0,k) 
t2 <- rep(0,k)
t3 <- rep(0,k)
L <- round(10^seq(4,7,length=k))


for (i in 1:k) {
  X <- 2*1:L[i]
  Y1 <- rep(0,L[i])
  t <- system.time(for (j in 1:L[i]) Y1[j] <- exp(X[j]))[3] #####
  t1[i] <- t
}

for (i in 1:k) {
  X <- 2*1:L[i]
  t <- system.time( Y2 <- sapply(1:L[i], function(q) exp(X[q])) )[3] #####
  t2[i] <- t
}

for (i in 1:k) {
  X <- 2*1:L[i]
  t <- system.time( Y3 <- sapply(X, function(x) exp(x)) )[3] #####
  t3[i] <- t
}

plot(L, t3, type='l', col='green')
lines(L, t2,col='red')
lines(L, t1,col='blue')

plot(log(L), log(t1), type='l', col='blue')
lines(log(L), log(t2),col='red')
lines(log(L), log(t3), col='green')

We get the following results. Plot of N vs t: enter image description here

Plot of log(N) vs log(t) enter image description here

The blue plot is the for loop method, and the red and green plots are the sapply methods. In the regular plot, you can see that, as sample size gets larger, the for loop method is heavily favoured over the sapply methods, which is not what I would have expected at all. If you look at the log-log plot (in order to more easily distinguish the smaller N results) we see the expected result of sapply being more efficient than for loop for small N.

Does anybody know why sapply scales more slowly than for loop with sample size? Thanks.

3条回答
ら.Afraid
2楼-- · 2019-06-20 11:44

Try taking out the excess function(x) code that runs every iteration. It must have a lot of overhead. I didn't separate the two, but the for loop should also include all associated work for an apples to apples comparison like this:

t <- system.time(Y1 <- rep(0,L[i])) + system.time(for (j in 1:L[i]) Y1[j] <- exp(X[j]))[3] #####

A much faster sapply:

for (i in 1:k) {
  X <- 2*1:L[i]
  t <- system.time( Y4 <- sapply(X,exp )[3]) #####
  t4[i] <- t
}

It's still slower, but much closer than the first two sapply's.

查看更多
我欲成王,谁敢阻挡
3楼-- · 2019-06-20 12:00

Most of the points have been made before, but...

  1. sapply() uses lapply() and then pays a one-time cost of formatting the result using simplify2array().

  2. lapply() creates a long vector, and then a large number of short (length 1) vectors, whereas the for loop generates a single long vector.

  3. The sapply() as written has an extra function call compared to the for loop.

  4. Using gcinfo(TRUE) lets us see the garbage collector in action, and each approach results in the garbage collector running several times -- this can be quite expensive, and not completely deterministic.

Points 1 - 3 need to be interpreted in the artificial context of the example -- exp() is a fast function, exaggerating the relative contribution of memory allocation (2), function evaluation (3), and one-time costs (1). Point 4 emphasizes the need to replicate timings in a systematic way.

I started by loading the compiler and microbenchmark packages. I focused on the largest size only

library(compiler)
library(microbenchmark)
n <- 10^7

In my first experiment I replaced exp() with simple assignment, and tried different ways of representing the result in the for loop -- a vector of numeric values, or list of numeric vectors as implied by lapply().

fun0n <- function(n) {
    Y1 <- numeric(n)
    for (j in seq_len(n)) Y1[j] <- 1
}
fun0nc <- compiler::cmpfun(fun0n)

fun0l <- function(n) {
    Y1 <- vector("list", n)
    for (j in seq_len(n)) Y1[[j]] <- 1
}
fun0lc <- compiler::cmpfun(fun0l)

microbenchmark(fun0n(n), fun0nc(n), fun0lc(n), times=5)
## Unit: seconds
##       expr      min       lq     mean   median       uq      max neval
##   fun0n(n) 5.620521 6.350068 6.487850 6.366029 6.933915 7.168717     5
##  fun0nc(n) 1.852048 1.974962 2.028174 1.984000 2.035380 2.294481     5
##  fun0lc(n) 1.644120 2.706605 2.743017 2.998258 3.178751 3.187349     5

So it pays to compile the for loop, and there's a fairly substantial cost to generating a list of vectors. Again this memory cost is amplified by the simplicity of the body of the for loop.

My next experiment explored different *apply()

fun2s <- function(n)
    sapply(raw(n), function(i) 1)
fun2l <- function(n)
    lapply(raw(n), function(i) 1)
fun2v <- function(n)
    vapply(raw(n), function(i) 1, numeric(1))

microbenchmark(fun2s(n), fun2l(n), fun2v(n), times=5)
## Unit: seconds
##      expr      min       lq     mean   median       uq      max neval
##  fun2s(n) 4.847188 4.946076 5.625657 5.863453 6.130287 6.341282     5
##  fun2l(n) 1.718875 1.912467 2.024325 2.141173 2.142004 2.207105     5
##  fun2v(n) 1.722470 1.829779 1.847945 1.836187 1.845979 2.005312     5

There is a large cost to the simplification step in sapply(); vapply() is more robust than lapply() (I am guaranteed the type of the return) without performance penalty, so it should be my go-to function in this family.

Finally, I compared the for iteration to vapply() where the result is a list-of-vectors.

fun1 <- function(n) {
    Y1 <- vector("list", n)
    for (j in seq_len(n)) Y1[[j]] <- exp(0)
}
fun1c <- compiler::cmpfun(fun1)

fun3 <- function(n)
    vapply(numeric(n), exp, numeric(1))
fun3fun <- function(n)
    vapply(numeric(n), function(i) exp(i), numeric(1))

microbenchmark(fun1c(n), fun3(n), fun3fun(n), times=5)
## Unit: seconds
##        expr      min       lq     mean   median       uq      max neval
##    fun1c(n) 2.265282 2.391373 2.610186 2.438147 2.450145 3.505986     5
##     fun3(n) 2.303728 2.324519 2.646558 2.380424 2.384169 3.839950     5
##  fun3fun(n) 4.782477 4.832025 5.165543 4.893481 4.973234 6.346498     5

microbenchmark(fun1c(10^3), fun1c(10^4), fun1c(10^5),
               fun3(10^3), fun3(10^4), fun3(10^5),
               times=50)
## Unit: microseconds
##         expr   min    lq  mean median    uq   max neval
##  fun1c(10^3)   199   215   230    228   241   279    50
##  fun1c(10^4)  1956  2016  2226   2296  2342  2693    50
##  fun1c(10^5) 19565 20262 21671  20938 23410 24116    50
##   fun3(10^3)   227   244   254    254   264   295    50
##   fun3(10^4)  2165  2256  2359   2348  2444  2695    50
##   fun3(10^5) 22069 22796 23503  23251 24393 25735    50

The compiled for loop and vapply() are neck-in-neck; the extra function call almost doubles the execution time of vapply() (again, this effect is exaggerated by the simplicity of the example). There does not seem to be much change in relative speed across a range of sizes

查看更多
来,给爷笑一个
4楼-- · 2019-06-20 12:08

You're not accounting for the time it takes to allocate space for the resulting vector Y1. As the sample size increases, the time it takes to allocate Y1 becomes a larger share of the execution time, and the time it takes to do the replacement becomes a smaller share.

sapply always allocates memory for the the result, so that's one reason it would be less efficient as sample size increases. gagolews also has a very good point about sapply calling simplify2array. That (likely) adds another copy.


After some more testing, it looks like lapply is still about the same or slower than a byte-compiled function containing a for loop, as the objects get larger. I'm not sure how to explain this, other than possibly this line in do_lapply:

if (MAYBE_REFERENCED(tmp)) tmp = lazy_duplicate(tmp);

Or possibly something with how lapply constructs the function call... but I'm mostly speculating.

Here's the code I used to test:

k <- 20 
t1 <- rep(0,k) 
t2 <- rep(0,k)
t3 <- rep(0,k)
L <- round(10^seq(4,7,length=k))
L <- round(10^seq(4,6,length=k))

# put the loop in a function
fun <- function(X, L) {
  Y1 <- rep(0,L)
  for (j in 1:L)
    Y1[j] <- exp(X[j])
  Y1
}
# for loops often benefit from compiling
library(compiler)
cfun <- cmpfun(fun)

for (i in 1:k) {
  X <- 2*1:L[i]
  t1[i] <- system.time( Y1 <- fun(X, L[i]) )[3]
}
for (i in 1:k) {
  X <- 2*1:L[i]
  t2[i] <- system.time( Y2 <- cfun(X, L[i]) )[3]
}
for (i in 1:k) {
  X <- 2*1:L[i]
  t3[i] <- system.time( Y3 <- lapply(X, exp) )[3]
}
identical(Y1, Y2)          # TRUE
identical(Y1, unlist(Y3))  # TRUE
plot(L, t1, type='l', col='blue', log="xy", ylim=range(t1,t2,t3))
lines(L, t2, col='red')
lines(L, t3, col='green')
查看更多
登录 后发表回答