I've been learning R for a while now, and have come across a lot of advice to programming types like myself to vectorize operations. Being a programmer, I'm interested as to why / how it's faster. An example:
n = 10^7
# populate with random nos
v=runif(n)
system.time({vv<-v*v; m<-mean(vv)}); m
system.time({for(i in 1:length(v)) { vv[i]<-v[i]*v[i] }; m<-mean(vv)}); m
This gave
user system elapsed
0.04 0.01 0.07
[1] 0.3332091
user system elapsed
36.68 0.02 36.69
[1] 0.3332091
The most obvious thing to consider is that we're running native code, i.e. machine code compiled from C or C++, rather than interpreted code, as shown by the massive difference in user time between the two examples (circa 3 orders of magnitude). But is there anything else going on? For example, does R do:
Cunning native data structures, e.g. clever ways of storing sparse vectors or matrices so that we only do multiplications when we need to?
Lazy evaluation, e.g. on a matrix multiply, don't evaluate cells until as and when you need to.
Parallel processing.
Something else.
To test whether there might be some sparse vector optimization I tried doing dot products with difference vector contents
# populate with random nos
v<-runif(n)
system.time({m<-v%*%v/n}); m
# populate with runs of 1 followed by 99 0s
v <-rep(rep(c(1,rep(0,99)),n/100))
system.time({m<-v%*%v/n}); m
# populate with 0s
v <-rep(0,n)
system.time({m<-v%*%v/n}); m
However there was no significant difference in time (circa 0.09 elapsed)
(Similar question for Matlab: Why does vectorized code run faster than for loops in MATLAB?)
You are running both interpreted code and native code in both examples. The difference is that in the second you are doing the loop at the R level resulting in many more function calls that all need to be interpreted, and then the C code called. In your first example, the loop happens within the compiled code and hence R has far less to interpret, far fewer R code calls and far fewer calls to compiled code.
That's most of it. The other big-ish component is that since R code is functional in its design paradigm, functions (attempt to) have no side effects, which means that in some (but perhaps not all; R does try to be efficient about this) instances calling
[<-
in side a for loop results in having to copy the entire object. That can get slow.A small side note: R does have rather extensive functionality for handling sparse matrix structures efficiently, but they aren't the "default".
In regard to parallel processing, out-of-the-box R does not do any parallel processing. Of course there is the built-in
parallel
package, but you have to adapt your code to using e.g.mclapply
to use parallel processing. There are options to let your linear algebra be calculated in parallel using a special version ofblas
, but this is not standardly using in R, although getting it to work does not seems that hard.