I know that loops are slow in R
and that I should try to do things in a vectorised manner instead.
But, why? Why are loops slow and apply
is fast? apply
calls several sub-functions -- that doesn't seem fast.
Update: I'm sorry, the question was ill-posed. I was confusing vectorisation with apply
. My question should have been,
"Why is vectorisation faster?"
It's not always the case that loops are slow and
apply
is fast. There's a nice discussion of this in the May, 2008, issue of R News:In the section "Loops!" (starting on pg 48), they say:
They further suggest:
They have a simple example where a
for
loop takes 1.3 sec butapply
runs out of memory.Just as a comparison (don't read too much into it!): I ran a (very) simple for loop in R and in JavaScript in Chrome and IE 8. Note that Chrome does compilation to native code, and R with the compiler package compiles to bytecode.
@Gavin Simpson: Btw, it took 1162 ms in S-Plus...
And the "same" code as JavaScript:
Loops in R are slow for the same reason any interpreted language is slow: every operation carries around a lot of extra baggage.
Look at
R_execClosure
ineval.c
(this is the function called to call a user-defined function). It's nearly 100 lines long and performs all sorts of operations -- creating an environment for execution, assigning arguments into the environment, etc.Think how much less happens when you call a function in C (push args on to stack, jump, pop args).
So that is why you get timings like these (as joran pointed out in the comment, it's not actually
apply
that's being fast; it's the internal C loop inmean
that's being fast.apply
is just regular old R code):Using a loop: 0.342 seconds:
Using sum: unmeasurably small:
It's a little disconcerting because, asymptotically, the loop is just as good as
sum
; there's no practical reason it should be slow; it's just doing more extra work each iteration.So consider:
(That example was discovered by Radford Neal)
Because
(
in R is an operator, and actually requires a name lookup every time you use it:Or, in general, interpreted operations (in any language) have more steps. Of course, those steps provide benefits as well: you couldn't do that
(
trick in C.The only Answer to the Question posed is; loops are not slow if what you need to do is iterate over a set of data performing some function and that function or the operation is not vectorized. A
for()
loop will be as quick, in general, asapply()
, but possibly a little bit slower than anlapply()
call. The last point is well covered on SO, for example in this Answer, and applies if the code involved in setting up and operating the loop is a significant part of the overall computational burden of the loop.Why many people think
for()
loops are slow is because they, the user, are writing bad code. In general (though there are several exceptions), if you need to expand/grow an object, that too will involve copying so you have both the overhead of copying and growing the object. This is not just restricted to loops, but if you copy/grow at each iteration of a loop, of course, the loop is going to be slow because you are incurring many copy/grow operations.The general idiom for using
for()
loops in R is that you allocate the storage you require before the loop starts, and then fill in the object thus allocated. If you follow that idiom, loops will not be slow. This is whatapply()
manages for you, but it is just hidden from view.Of course, if a vectorised function exists for the operation you are implementing with the
for()
loop, don't do that. Likewise, don't useapply()
etc if a vectorised function exists (e.g.apply(foo, 2, mean)
is better performed viacolMeans(foo)
).