mclapply vs for loops for plotting: speed and scal

2020-03-06 03:22发布

问题:

I am running a function in R that can take a long time to run as it carries out multiple commands to transform and subset some data before it pushes it into ggplot to plot. I need to run this function multiple times adjusting the arguments values. The example I will provide is a simple one...but was wondering how to speed it up? if scaled up, i.e. what is the fastest way of getting every single combination...is there a generic method of converting for loops into mclapply assuming they are faster...please feel free to provide alternative mock examples that demonstrate a preference for a particular method

mock example:

the basic function:

ff <- function(n, mu, stdev){
     x1 <- c(1:n)
     y1 <- rnorm(n,mu,stdev)
     z1 <- data.frame(cbind(x1,y1))
     ggplot(z1, aes(x=x1,y=y1))+
       geom_point()+
       labs(title=paste("n=",n,"mu=",mu, "stdev=",stdev))
}

so the nieve way of going through parameters would be to do the following...

for(i in 1:10){
    for(j in 1:2){
       for(k in seq(100,500,by=100)){
         ff(k,i,j)
       }
    }
}

what would be the fastest way of speeding this up? I'm assuming it might need something like expand.grid(x=c(1:10),y=c(1:2),z=seq(100,500,by=100)) and the using mclapply to run through each row...in some sort of parallel manner? (I have 4 cores available for this). Please feel free to pull bits out of the basic function or put things into the basic function according to the methods that would create the greatest improvement in speed. The process will obviously take longer if you increase the range for each parameter, but is there nothing that can be done about that...or can that be changed somehow too if split across more cores or something...?

and for bonus points...is there anything that will save the output images and create sliders like in the package manipulate to go through all the parameters in an interactive manner...in which all it is doing is pulling out the relevant image, rather than recalculating it each time.

N.B. Please feel free to use/suggest any other packages (like foreach) that you think might be useful for your solution

回答1:

Saving the output images in pretty easy. Simply call ggsave() in your ff() function.

ff <- function(n, mu, stdev){
  x1 <- c(1:n)
  y1 <- rnorm(n,mu,stdev)
  z1 <- data.frame(cbind(x1,y1))
  ggplot(z1, aes(x=x1,y=y1))+
    geom_point()+
    labs(title=paste("n=",n,"mu=",mu, "stdev=",stdev))
  ggsave(paste0(n,"_", mu, "_", stdev, ".jpeg"))
}

You were spot on with your suggestion to use expand.grid(). Here's what I'd do:

x <- expand.grid(i = 1:10, j = 1:2, k = seq(100,500,100))

And then to call it, I'd use lapply() or mclapply() if you're on Linux and have multiple cores available:

lapply(seq(nrow(x)), function(i) ff(x[i,2], x[i,1], x[i,3]))

This creates 100 jpegs that have the naming convention of "n_mu_stdev.jpeg". As for an efficient way to access these and render them on screen, I'd look into a web browser and some simple CSS and jQuery to make it purty. That's really a separate question though IMHO.



回答2:

If using mclapply, combine the parameters into a list and pass that to the function rather than using a for loop.

e.g.

df <- expand.grid(i = 1:10, j = 1:2 , k = seq(100, 500, 100))
params <- mapply(list, n = df[, 3], mu = df[, 1], stdev = df[,2], SIMPLIFY = F)

ff <- function(tlist) {
    n <- tlist$n 
    mu <- tlist$mu 
    stdev <- tlist$stdev
     x1 <- c(1:n)
     y1 <- rnorm(n,mu,stdev)
     z1 <- data.frame(cbind(x1,y1))
     ggplot(z1, aes(x=x1,y=y1))+
       geom_point()+
       labs(title=paste("n=",n,"mu=",mu, "stdev=",stdev))
}

results <- llply(params, ff, .progress='text')

If using mclapply

results <- mclapply(params, ff, mc.cores = 4, mc.preschedule = TRUE)