foreach and non-parallel change of iterators

2019-09-10 06:21发布

问题:

I need to execute foreach with two iteration variables of the same size (for example 1:n), but the function changes them in parallel as written here:

We call a and b the iteration variables, since those are the variables that are changing during the multiple executions. Note that we are iterating over them in parallel, that is, they are both changing at the same time.

What I need is to make foreach to change them independently, so that I would have a list with length n^2, not n.

example:

X = foreach(i=1:n, j=1:n) %do% (sum(M[i,]*M[j,]))

in the end I get a vector of length n which is only a diagonal of matrix X, not the full matrix.

P.S. I was trying to make this with for looping, but the computation time was too great to leave the code unoptimized.

回答1:

To nest foreach loops, use the nesting operator, "%:%":

library(foreach)
n <- 4
M <- matrix(rnorm(n*n), n)
X <- foreach(i=1:n, .combine='cbind') %:%
       foreach(j=1:n, .combine='c') %do% {
         sum(M[i,]*M[j,])
       }

To avoid repeated computations, you can have j iterate from i to n, and use a .final function to pad the resulting vectors with NA:

pad <- function(x) c(rep(NA, n - length(x)), x)
Y <- foreach(i=1:n, .combine='cbind') %:%
       foreach(j=i:n, .combine='c', .final=pad) %do% {
         sum(M[i,]*M[j,])
       }

But these solutions are only of academic interest. For simplicity and speed, I suspect that tcrossprod is the best solution by far:

Z <- tcrossprod(M)

For a 4000 X 4000 matrix, this executed in under 8 seconds on my Linux machine.



回答2:

foreach is not more efficient than for. Look for the %:% operator as commented by @BenBarnes to use it. Parallelization might help a bit, but not much.

Try the following instead of an explicit loop:

M <- matrix(1:8,4)

prodsums <- combn(seq_len(nrow(M)), 2, FUN=function(ind) {
  res <- sum(M[ind[1],]*M[ind[2],])
  names(res) <- paste(ind, collapse="*")
  res
}, simplify=F)

unlist(prodsums)
#1*2 1*3 1*4 2*3 2*4 3*4 
# 32  38  44  48  56  68 

resmat <- matrix(ncol=nrow(M),nrow=nrow(M))
resmat[lower.tri(resmat)] <- unlist(prodsums)
#       [,1] [,2] [,3] [,4]
# [1,]   NA   NA   NA   NA
# [2,]   32   NA   NA   NA
# [3,]   38   48   NA   NA
# [4,]   44   56   68   NA

resmat[upper.tri(resmat)] <- t(resmat)[upper.tri(resmat)]

diag(resmat) <- rowSums(M^2)
#     [,1] [,2] [,3] [,4]
#[1,]   26   32   38   44
#[2,]   32   40   48   56
#[3,]   38   48   58   68
#[4,]   44   56   68   80