Avoid the use of for loops

2020-06-06 07:15发布

问题:

I'm working with R and I have a code like this:

for (i in 1:10)
   for (j in 1:100)
        if (data[i] == paths[j,1])
            cluster[i,4] <- paths[j,2]

where :

  • data is a vector with 100 rows and 1 column
  • paths is a matrix with 100 rows and 5 columns
  • cluster is a matrix with 100 rows and 5 columns

My question is: how could I avoid the use of "for" loops to iterate through the matrix? I don't know whether apply functions (lapply, tapply...) are useful in this case.

This is a problem when j=10000 for example, because execution time is very long.

Thank you

回答1:

Inner loop could be vectorized

cluster[i,4] <- paths[max(which(data[i]==paths[,1])),2]

but check Musa's comment. I think you indented something else.

Second (outer) loop could be vectorize either, by replicating vectors but

  1. if i is only 100 your speed-up don't be large
  2. it will need more RAM

[edit] As I understood your comment can you just use logical indexing?

indx <- data==paths[, 1]
cluster[indx, 4] <- paths[indx, 2]


回答2:

I think that both loops can be vectorized using the following:

cluster[na.omit(match(paths[1:100,1],data[1:10])),4] = paths[!is.na(match(paths[1:100,1],data[1:10])),2]