I know it's not the best practice in R to use the for
loop because it doesn't have an enhanced performance. For almost all cases there is a function of the family *apply
that solves our problems.
However I'm facing a situation where I don't see a workaround.
I need to calculate percent variation for consecutive values:
pv[1] <- 0
for(i in 2:length(x)) {
pv[i] <- (x[i] - x[i-1])/x[i-1]
}
So, as you can see, I have to use both the x[i]
element, but also the x[i-1]
element. By using the *apply
functions, I just see how to use the x[i]
. Is there anyway I can avoid the for
loops?
What you offered would be the fractional variation, but if you multiplied by 100 you get the "percent variation":
Vectorized solution. ( And you should note that for-loops are going to be just as slow as *apply solutions ... just not as pretty. Always look for a vectorized approach.)
To explain a bit more: The
x[-length(x)]
is the vector,x[1:(length{x-1)]
, and thex[-1]
is the vector,x[2:length(x)]
, and the vector operations in R are doing the same operations as in your for-loop body, although not using an explicit loop. R first constructs the differences in those shifted vectors,x[-length(x)] - x[-1]
, and then divides byx[1:(length{x-1)]
.You can get the same results with:
The for loop issues that once were a problem have been optimized. Often a for loop is not slower and may even be faster than the apply solution. You have to test them both and see. I'm betting your for loop is faster than my solution.
EDIT: Just to illustrate the for loop vs. apply solution as well as what DWin discusses about vectorization I ran the benchmarking on the four solutions using microbenchmark on a win 7 machine.
You can also use
diff
: