I am now dealing with a large dataset and some functions may take hours to process. I wonder how I can show the progress of the code through a progress bar or number(1,2,3,...,100). And I want to store the result as a data frame with two columns. Here is an example. Thanks.
require(foreach)
require(doParallel)
require(Kendall)
cores=detectCores()
cl <- makeCluster(cores-1)
registerDoParallel(cl)
mydata=matrix(rnorm(8000*500),ncol = 500)
result=as.data.frame(matrix(nrow = 8000,ncol = 2))
pb <- txtProgressBar(min = 1, max = 8000, style = 3)
foreach(i=1:8000,.packages = "Kendall",.combine = rbind) %dopar%
{
abc=MannKendall(mydata[i,])
result[i,1]=abc$tau
result[i,2]=abc$sl
setTxtProgressBar(pb, i)
}
close(pb)
stopCluster(cl)
However, when I run the code, I did not see any progress bar showing up and the result is not right. Is there any suggestion? Thanks.
The doSNOW package has support for progress bars, while doParallel does not. Here's a way to put a progress bar in your example:
require(doSNOW)
require(Kendall)
cores <- parallel::detectCores()
cl <- makeSOCKcluster(cores)
registerDoSNOW(cl)
mydata <- matrix(rnorm(8000*500), ncol=500)
pb <- txtProgressBar(min=1, max=8000, style=3)
progress <- function(n) setTxtProgressBar(pb, n)
opts <- list(progress=progress)
result <-
foreach(i=1:8000, .packages="Kendall", .options.snow=opts,
.combine='rbind') %dopar% {
abc <- MannKendall(mydata[i,])
data.frame(tau=abc$tau, sl=abc$sl)
}
close(pb)
stopCluster(cl)
I think the pbapply package also does the job.
require(parallel)
require(pbapply)
mydata=matrix(rnorm(8000*500),ncol = 500)
cores=detectCores()
cl <- makeCluster(cores-1)
parallel::clusterExport(cl= cl,varlist = c("mydata"))
parallel::clusterEvalQ(cl= cl,library(Kendall))
result = pblapply(cl = cl,
X = 1:8000,
FUN = function(i){
abc=MannKendall(mydata[i,])
result = as.data.frame(matrix(nrow = 1,ncol = 2))
result[1,1]=abc$tau
result[1,2]=abc$sl
return(result)
})
result = dplyr::bind_rows(result)
stopCluster(cl)
From the documentation, if a socket cluster is provided via cl
then it calls parLapply()
Parallel processing can be enabled through the cl argument. parLapply is called when cl is a
’cluster’ object, mclapply is called when cl is an integer.