Why is vectorization faster

2019-01-24 19:44发布

问题:

I've been learning R for a while now, and have come across a lot of advice to programming types like myself to vectorize operations. Being a programmer, I'm interested as to why / how it's faster. An example:

n = 10^7
# populate with random nos
v=runif(n)
system.time({vv<-v*v; m<-mean(vv)}); m
system.time({for(i in 1:length(v)) { vv[i]<-v[i]*v[i] }; m<-mean(vv)}); m

This gave

   user  system elapsed 
   0.04    0.01    0.07 
[1] 0.3332091

   user  system elapsed 
  36.68    0.02   36.69 
[1] 0.3332091

The most obvious thing to consider is that we're running native code, i.e. machine code compiled from C or C++, rather than interpreted code, as shown by the massive difference in user time between the two examples (circa 3 orders of magnitude). But is there anything else going on? For example, does R do:

  • Cunning native data structures, e.g. clever ways of storing sparse vectors or matrices so that we only do multiplications when we need to?

  • Lazy evaluation, e.g. on a matrix multiply, don't evaluate cells until as and when you need to.

  • Parallel processing.

  • Something else.

To test whether there might be some sparse vector optimization I tried doing dot products with difference vector contents

# populate with random nos
v<-runif(n)
system.time({m<-v%*%v/n}); m
# populate with runs of 1 followed by 99 0s
v <-rep(rep(c(1,rep(0,99)),n/100))
system.time({m<-v%*%v/n}); m
# populate with 0s
v <-rep(0,n)
system.time({m<-v%*%v/n}); m

However there was no significant difference in time (circa 0.09 elapsed)

(Similar question for Matlab: Why does vectorized code run faster than for loops in MATLAB?)

回答1:

The most obvious thing to consider is that we're running native code, i.e. machine code compiled from C or C++, rather than interpreted code

That's most of it. The other big-ish component is that since R code is functional in its design paradigm, functions (attempt to) have no side effects, which means that in some (but perhaps not all; R does try to be efficient about this) instances calling [<- in side a for loop results in having to copy the entire object. That can get slow.

A small side note: R does have rather extensive functionality for handling sparse matrix structures efficiently, but they aren't the "default".



回答2:

You are running both interpreted code and native code in both examples. The difference is that in the second you are doing the loop at the R level resulting in many more function calls that all need to be interpreted, and then the C code called. In your first example, the loop happens within the compiled code and hence R has far less to interpret, far fewer R code calls and far fewer calls to compiled code.



回答3:

In regard to parallel processing, out-of-the-box R does not do any parallel processing. Of course there is the built-in parallel package, but you have to adapt your code to using e.g. mclapply to use parallel processing. There are options to let your linear algebra be calculated in parallel using a special version of blas, but this is not standardly using in R, although getting it to work does not seems that hard.