I'd like some advice on writing better R code. I have currently written a loop in R and it suffers from performance issues.
I can't wrap my head around vectorizing it, because each row in the output data frame has dependencies upon earlier rows and they trickle down iteratively, so I have written a loop to read/write the rows in sequence.
An example of my code:
example <- data.frame(a=c(.5,.1,.5,.25),b=c(1,0,2,0),c=c(1,2,3,4),d=c(4,3,2,1))
for (i in 2:nrow(example)) {
if (example[i,1]>0) {
example[i,2]<-example[i,2]+example[i-1,2]*example[i,1]
example[i,3]<-example[i,3]+example[i-1,3]*example[i,1]
example[i,4]<-example[i,4]+example[i-1,4]*example[i,1]
}
}
To see what's happening:
# before
a b c d
1 0.50 1 1 4
2 0.10 0 2 3
3 0.50 2 3 2
4 0.25 0 4 1
# after
a b c d
1 0.50 1.0000 1.0000 4.000
2 0.10 0.1000 2.1000 3.400
3 0.50 2.0500 4.0500 3.700
4 0.25 0.5125 5.0125 1.925
I'm not sure how to avoid by row operations, but here are 3 advises that will improve performance by ~X90
In other words, try converting your code to
Also, note that this solution is generalized for any number of columns
Benchmark