How to do parallel in R?

2019-09-12 01:34发布

I'm reading csv file in directory with more than 100 files, then I'm doing some stuff, I have 8 cores cpu so I want to do in parallel mode to finish faster.

I wrote some code but it doesn't work for me - (using linux)

library(data.table)
library(parallel)

# Calculate the number of cores
no_cores <- detectCores() - 1
# Initiate cluster
cl <- makeCluster(no_cores)

processFile <- function(f) {

  # reading file by data.table 
  df <- fread(f,colClasses = c(NA,NA, NA,"NULL", "NULL", "NULL"))

  A <- parLapply(cl,sapply(windows, function(w) {return(numOverlaps(w,df))}))

  stopCluster(cl)
}

files <- dir("/home/shared/", recursive=TRUE, full.names=TRUE, pattern=".*\\.txt$")

# Apply the function to all files.

 result <- sapply(files, processFile)

As you see I want to run function in processFile(A) but it doesn't work!

How it's possible to run that function in parallel processing mode?

1条回答
淡お忘
2楼-- · 2019-09-12 02:29

You have the concept on its head. You need to pass parLapply the list of files and then work on them. The anonymous function should do the entire process of processing the individual file and returning the desired result.

My suggestion would be to first make this work using regular lapply or sapply and only then power up parallel backend, export all necessary libraries and objects you may need.

parLapply(cl, X = files, FUN = function(x, ...) {
  ... code for processing the file
})
查看更多
登录 后发表回答