Hi I'm trying manipulate a list of numbers and I would like to do so without a for loop, using fast native operation in R. The pseudocode for the manipulation is :
By default the starting total is 100 (for every block within zeros)
From the first zero to next zero, the moment the cumulative total falls by more than 2% replace all subsequent numbers with zero.
Do this far all blocks of numbers within zeros
The cumulative sums resets to 100 every time
For example if following were my data :
d <- c(0,0,0,1,3,4,5,-1,2,3,-5,8,0,0,-2,-3,3,5,0,0,0,-1,-1,-1,-1);
Results would be :
0 0 0 1 3 4 5 -1 2 3 -5 0 0 0 -2 -3 0 0 0 0 0 -1 -1 -1 0
Currently I have an implementation with a for loop, but since my vector is really long, the performance is terrible.
Thanks in advance.
Here is a running sample code :
d <- c(0,0,0,1,3,4,5,-1,2,3,-5,8,0,0,-2,-3,3,5,0,0,0,-1,-1,-1,-1);
ans <- d;
running_total <- 100;
count <- 1;
max <- 100;
toggle <- FALSE;
processing <- FALSE;
for(i in d){
if( i != 0 ){
processing <- TRUE;
if(toggle == TRUE){
ans[count] = 0;
}
else{
running_total = running_total + i;
if( running_total > max ){ max = running_total;}
else if ( 0.98*max > running_total){
toggle <- TRUE;
}
}
}
if( i == 0 && processing == TRUE )
{
running_total = 100;
max = 100;
toggle <- FALSE;
}
count <- count + 1;
}
cat(ans)
I am not sure how to translate your loop into vectorized operations. However, there are two fairly easy options for large performance improvements. The first is to simply put your loop into an
R
function, and use thecompiler
package to precompile it. The second slightly more complicated option is to translate yourR
loop into ac++
loop and use theRcpp
package to link it to anR
function. Then you call anR
function that passes it toc++
code which is fast. I show both these options and timings. I do want to gratefully acknowledge the help of Alexandre Bujard from the Rcpp listserv, who helped me with a pointer issue I did not understand.First, here is your
R
loop as a function,foo.r
.Now we can load the
compiler
package and compile the function and call itfoo.rcomp
.That is all it takes for the compilation route. It is all
R
and obviously very easy. Now for thec++
approach, we use theRcpp
package as well as theinline
package which allows us to "inline" thec++
code. That is, we do not have to make a source file and compile it, we just include it in theR
code and the compilation is handled for us.Now we can test that we get the expected results:
Finally, create a much larger version of
d
by repeating it 10e4 times. Then we can run the three different functions, pureR
code, compiledR
code, andR
function linked toc++
code.Which on my system, gives:
The compiled
R
code takes about 1/6 the time the uncompiledR
code taking only 2 seconds to operate on the vector of 2.5 million. Thec++
code is orders of magnitude faster even then the compiledR
code requiring just .02 seconds to complete. Aside from the initial setup, the syntax for the basic loop is nearly identical inR
andc++
so you do not even lose clarity. I suspect that even if parts or all of your loop could be vectorized inR
, you would be sore pressed to beat the performance of theR
function linked toc++
. Lastly, just for proof:The different functions return the same results.