Replace rbind in for-loop with lapply? (2nd circle

2019-01-24 21:11发布

问题:

I am having trouble optimising a piece of R code. The following example code should illustrate my optimisation problem:

Some initialisations and a function definition:

a <- c(10,20,30,40,50,60,70,80)
b <- c(“a”,”b”,”c”,”d”,”z”,”g”,”h”,”r”)
c <- c(1,2,3,4,5,6,7,8)
myframe <- data.frame(a,b,c)
values <- vector(length=columns)
solution <- matrix(nrow=nrow(myframe),ncol=columns+3)

myfunction <- function(frame,columns){
athing = 0
   if(columns == 5){
   athing = 100
   }
   else{
   athing = 1000
   }
value[colums+1] = athing
return(value)}

The problematic for-loop looks like this:

columns = 6
for(i in 1:nrow(myframe){
   values <- myfunction(as.matrix(myframe[i,]), columns)
   values[columns+2] = i
   values[columns+3] = myframe[i,3]
   #more columns added with simple operations (i.e. sum)

   solution <- rbind(solution,values)
   #solution is a large matrix from outside the for-loop
}

The problem seems to be the rbind function. I frequently get error messages regarding the size of solution which seems to be to large after a while (more than 50 MB). I want to replace this loop and the rbind with a list and lapply and/or foreach. I have started with converting myframeto a list.

myframe_list <- lapply(seq_len(nrow(myframe)), function(i) myframe[i,])

I have not really come further than this, although I tried applying this very good introduction to parallel processing.

How do I have to reconstruct the for-loop without having to change myfunction? Obviously I am open to different solutions...

Edit: This problem seems to be straight from the 2nd circle of hell from the R Inferno. Any suggestions?

回答1:

The reason that using rbind in a loop like this is bad practice, is that in each iteration you enlarge your solution data frame and then copy it to a new object, which is a very slow process and can also lead to memory problems. One way around this is to create a list, whose ith component will store the output of the ith loop iteration. The final step is to call rbind on that list (just once at the end). This will look something like

my.list <- vector("list", nrow(myframe))
for(i in 1:nrow(myframe)){
    # Call all necessary commands to create values
    my.list[[i]] <- values
}
solution <- rbind(solution, do.call(rbind, my.list))


回答2:

A bit to long for comment, so I put it here: If columns is known in advance:

    myfunction <- function(frame){
    athing = 0
       if(columns == 5){
       athing = 100
       }
       else{
       athing = 1000
       }
    value[colums+1] = athing
    return(value)}

    apply(myframe, 2, myfunction)

If columns is not given via environment, you can use:

apply(myframe, 2, myfunction, columns) with your original myfunction definition.