可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

(Very) amateur coder and statistician working on a problem in R.

I have four integer lists: A, B, C, D.

A <- [1:133]
B <- [1:266]
C <- [1:266]
D <- [1:133, 267-400]

I want R to generate all of the permutations from picking 1 item from each of these lists (I know this code will take forever to run), and then take the mean of each of those permutations. So, for instance, [1, 100, 200, 400] -> 175.25.

Ideally what I would have at the end is a list of all of these means then.

Any ideas?

回答1:

Here's how I'd do this for a smaller but similar problem:

A <- 1:13
B <- 1:26
C <- 1:26
D <- c(1:13, 27:40)

mymat <- expand.grid(A, B, C, D)
names(mymat) <- c("A", "B", "C", "D")
mymat <- as.matrix(mymat)
mymeans <- rowSums(mymat)/4

You'll probably crash R if you just up all the indices, but you could probably set up a loop, something like this (not tested):

B <- 1:266
C <- 1:266
D <- c(1:133, 267:400)

for(A in 1:133) {
    mymat <- expand.grid(A, B, C, D)
    names(mymat) <- c("A", "B", "C", "D")
    mymat <- as.matrix(mymat)
    mymeans <- rowSums(mymat)/4
    write.table(mymat, file = paste("matrix", A, "txt", sep = "."))
    write.table(mymeans, file = paste("means", A, "txt", sep = "."))
    rm(mymat, mymeans)
}

to get them all. That still might be too big, in which case you could do a nested loop, or loop over D (since it's the biggest)

Alternatively,

n <- 1e7
A <- sample(133, size = n, replace= TRUE)
B <- sample(266, size = n, replace= TRUE)
C <- sample(266, size = n, replace= TRUE)
D <- sample(x = c(1:133, 267:400), size = n, replace= TRUE)
mymeans <- (A+B+C+D)/4

will give you a large sample of the means and take no time at all.

hist(mymeans)

回答2:

Even creating a vector of means as large as your permutations will use up all of your memory. You will have to split this into smaller problems, look up writing objects to excel and then removing objects from memory here (both on SO).

As for the code to do this, I've tried to keep it as simple as possible so that it's easy to 'grow' your knowledge:

#this is how to create vectors of sequential integers integers in R
a <- c(1:33)
b <- c(1:33)
c <- c(1:33)
d <- c(1:33,267:300)

#this is how to create an empty vector
means <- rep(NA,length(a)*length(b)*length(c)*length(d))
#set up for a loop
i <- 1

#how you run a loop to perform this operation
for(j in 1:length(a)){
    for(k in 1:length(b)){
        for(l in 1:length(c)){
            for(m in 1:length(d)){
                y <- c(a[j],b[k],c[l],d[m])
                means[i] <- mean(y)
                i <- i+1
            }
        }
    }
}

#and to graph your output
hist(means, col='brown')
#lets put a mean line through the histogram
abline(v=mean(means), col='white', lwd=2)