Apply function too slow in r

2019-07-20 22:37发布

问题:

I have to calculate for a lot of species a specific formula per row. The formula is a product between a value of abundance and a value present in the last row of the data frame. Then, all these products are summed.

My current script consists in using an apply function which appears to be as slow as the for-loop I started with. I simplified the problem in the following script, using a simple df called az :

az=data.frame(c(1,2,10),c(2,4,20),c(3,6,30))
colnames(az)=c("a","b","c")


# Initial for loop
prov=0 # prov for provisional number
    for (i in 1:nrow(az)){
            for (j in 1:ncol(az)){
                   prov=prov+az[i,j]*az[nrow(az),j]
            }
        print(prov)
        prov=0
        }

# Apply solution
apply(az[,], 1, function(x) {sum(x*az[nrow(az),], na.rm=TRUE)})

Both solutions work but they are quite slow (with my original df) and I have to repeat the operation for a huge number of species. Thus, I was wondering if anyone has a more efficient solution, maybe using vectorized expressions.

Kind regards.

回答1:

The fastest solution is probably matrix algebra:

apply(az[,], 1, function(x) {sum(x*az[nrow(az),], na.rm=TRUE)})
#[1]  140  280 1400

m <- as.matrix(az)
m[is.na(m)] <- 0 #remove NA from sums
as.vector(m %*% m[nrow(m),])
#[1]  140  280 1400

回答2:

Try

  rowSums(az*unlist(az[nrow(az),])[col(az)], na.rm=TRUE)

Or a slightly faster option would be to use rep

  rowSums(az*rep(unlist(az[nrow(az),]),each=ncol(az)), na.rm=TRUE)