Double For Loop to calculate averages and store th

2019-05-29 09:43发布

问题:

I'm having trouble running this double for loop to properly store the calculated values into the matrix (mentioned below). The reason why I elected to use the double For Loop and not apply() or mean() is that I want to obtain the unique combinations of the two columns and eliminate redundancy (explained below). See below for an example:

A<-c(1,2,3,4,5)
B<-c(2,3,4,5,6)
Q1<-data.frame(cbind(A,B))
mean<-matrix(nrow=5, ncol = 5)
for(i in 1: length(Q1$A)){
  for(j in 2: length(Q1$B)){
    mean[i,j]<-sum(Q1$A[i]+Q1$B[j])/2
  }
}

Here, I attempted to run the entire A vector through the entire B vector while eliminating redundancy, such that A[1] has four values from B[2], and A[2] has three values from B[3]. However, this was my result.

     [,1] [,2] [,3] [,4] [,5]
[1,]   NA  2.0  2.5  3.0  3.5
[2,]   NA  2.5  3.0  3.5  4.0
[3,]   NA  3.0  3.5  4.0  4.5
[4,]   NA  3.5  4.0  4.5  5.0
[5,]   NA  4.0  4.5  5.0  5.5

While the first column was what I expected, I have values I didn't want. What I want instead is the below matrix output:

     [,1] [,2] [,3] [,4] [,5]
[1,]   NA  2.0  2.5  3.0  3.5
[2,]   NA   NA  3.0  3.5  4.0
[3,]   NA   NA   NA  4.0  4.5
[4,]   NA   NA   NA   NA  5.0
[5,]   NA   NA   NA   NA   NA

Any suggestions?

回答1:

[Original Solution (see Update 2 for the faster solutions)]

f.m <- function(Q1) {
    z <- matrix(nrow=nrow(Q1),ncol=nrow(Q1))
    b <- row(z) < col(z)
    z[b] <- (Q1$A[col(z)[b]] + Q1$B[row(z)[b]])/2
    z
}

[Sample output]

f.m(Q1)
#      [,1] [,2] [,3] [,4] [,5]
# [1,]   NA    2  2.5  3.0  3.5
# [2,]   NA   NA  3.0  3.5  4.0
# [3,]   NA   NA   NA  4.0  4.5
# [4,]   NA   NA   NA   NA  5.0
# [5,]   NA   NA   NA   NA   NA

[Benchmarking Setup]

f0 <- function(Q1) {
    mean<-matrix(nrow=nrow(Q1), ncol = nrow(Q1))
    for(i in 1: length(Q1$A)){
        for(j in 2: length(Q1$B)){
            mean[i,j]<-sum(Q1$A[i]+Q1$B[j])/2
        }
    }
    mean
}

f1 <- function(Q1) {
    mean<-matrix(nrow=nrow(Q1), ncol = nrow(Q1))
    for(i in 2: length(Q1$A)){
        for(j in i: length(Q1$B)){
            mean[i,j]<-sum(Q1$A[i]+Q1$B[j])/2
        }
    }
    mean
} 

# Note that f0() and f1() don't return the desired result for the sample output

f2 <- function(Q1) {
    mean<-outer(1: length(Q1$A), 
                1: length(Q1$B),
                Vectorize(function(i,j){
                    if(i >= j)
                        return(NA)
                    else 
                        return(sum(Q1$A[i]+Q1$B[j])/2)
                }))
    mean
}

library(rbenchmark)

[Benchmarking Result]

A <- B <- 1:100
Q1<-data.frame(A,B)

benchmark(f0(Q1), f1(Q1), f2(Q1), f.m(Q1), replications = 10)
     test replications elapsed relative user.self sys.self user.child sys.child
4 f.m(Q1)           10   0.011    1.000     0.012    0.000          0         0
1  f0(Q1)           10   3.018  274.364     3.007    0.008          0         0
2  f1(Q1)           10   1.477  134.273     1.474    0.003          0         0
3  f2(Q1)           10   1.777  161.545     1.774    0.002          0         0

[Update 1]

Another order of running time could be saved by direct calculation of the entire matrix, which avoids messing with costly (comparing to summation) subsetting:

f.m2 <- function(Q1) outer(Q1$A,Q1$B,'+')*0.5

Another portion of benchmarking:

A <- B <- 1:1000
Q1<-data.frame(A,B)
#benchmark(f0(Q1), f1(Q1), f2(Q1), f.m(Q1), replications = 10)
benchmark(f.m(Q1), f.m2(Q1), replications = 10)

      test replications elapsed relative user.self sys.self user.child sys.child
1  f.m(Q1)           10   1.839   10.274     1.746    0.093          0         0
2 f.m2(Q1)           10   0.179    1.000     0.144    0.035          0         0

[Update 2]

1) As noted by David Arenburg, function f.m2() does not produce exactly the expected output, because lower left triangle and main diagonal of the output should be filled with NAs. The function f.m2() can be fixed to produce the proper answer at the cost of performance (see benchmarking below).

# Suggested by David Arenburg
f.m2.1 <- function(Q1) { 
   Res <- outer(Q1$A,Q1$B,'+')*0.5; 
   Res[lower.tri(Res, diag = TRUE)] <- NA; 
   Res 
}

2) Here is another approach suggested by David Arenburg, which makes use of the CJ function from the data.table package:

library(data.table)
f.DA <- function(Q1){ 
  Res <- matrix(rowMeans(CJ(Q1$A, Q1$B)), ncol = nrow(Q1))
  Res[lower.tri(Res, diag = TRUE)] <- NA
  Res 
}

3) Here is an Rcpp-based approach:

library(Rcpp)
cppFunction('NumericMatrix fC(NumericVector A, NumericVector B) {

  int n = A.size();
  NumericMatrix out(n,n);
  std::fill( out.begin(), out.end(), NumericVector::get_na() ) ;

  for (int i = 0; i < n; i++) {
    for (int j = i+1; j < n; j++) {
      out(i,j) = 0.5*(A[i] + B[j]);
    }
  }
  return out;
}')

4) And another benchmarking study:

A <- B <- 1:3000
Q1<-data.frame(A,B)
benchmark(f.m2(Q1), f.m2.1(Q1), f.DA(Q1), fC(Q1$A, Q1$B), replications = 10)

            test replications elapsed relative user.self sys.self user.child sys.child
3       f.DA(Q1)           10   7.442   11.556     6.200    1.209          0         0
2     f.m2.1(Q1)           10   5.111    7.936     4.404    0.661          0         0
1       f.m2(Q1)           10   1.007    1.564     0.733    0.263          0         0
4 fC(Q1$A, Q1$B)           10   0.644    1.000     0.525    0.116          0         0


回答2:

The second for loop should be:

 for(j in (i+1):length(Q1$B))


回答3:

you want to use the next keyword to skip the operations you don't need, as in:

A<-c(1,2,3,4,5)
B<-c(2,3,4,5,6)
Q1<-data.frame(cbind(A,B))
mean<-matrix(nrow=5, ncol = 5)
for(i in 1: length(Q1$A))
for(j in 2: length(Q1$B)){
    if(i >= j)
        next
    mean[i,j]<-sum(Q1$A[i]+Q1$B[j])/2
}

or you could make the iterand of the inner for loop conditional on the value of the outer index, as in:

mean<-matrix(nrow=5, ncol = 5)
for(i in 2: length(Q1$A)){
    for(j in i: length(Q1$B)){
        mean[i,j]<-sum(Q1$A[i]+Q1$B[j])/2
    }
}

or you could use outer() as in:

mean<-outer(1: length(Q1$A), 
            1: length(Q1$B),
            Vectorize(function(i,j){
                if(i >= j)
                    return(NA)
                else 
                    return(sum(Q1$A[i]+Q1$B[j])/2)
            }))


回答4:

Not exactly a double For Loop, but you could just use the outer function to calculate the averages.

outer(Q1$Col1, Q1$Col2, "+")/2


标签: r for-loop mean