How to do parallel in R?

2019-09-12 01:34发布

I'm reading csv file in directory with more than 100 files, then I'm doing some stuff, I have 8 cores cpu so I want to do in parallel mode to finish faster.

I wrote some code but it doesn't work for me - (using linux)

library(data.table)
library(parallel)

# Calculate the number of cores
no_cores <- detectCores() - 1
# Initiate cluster
cl <- makeCluster(no_cores)

processFile <- function(f) {

  # reading file by data.table 
  df <- fread(f,colClasses = c(NA,NA, NA,"NULL", "NULL", "NULL"))

  A <- parLapply(cl,sapply(windows, function(w) {return(numOverlaps(w,df))}))

  stopCluster(cl)
}

files <- dir("/home/shared/", recursive=TRUE, full.names=TRUE, pattern=".*\\.txt$")

# Apply the function to all files.

 result <- sapply(files, processFile)

As you see I want to run function in processFile(A) but it doesn't work!

How it's possible to run that function in parallel processing mode?

标签： r parallel-processing

1条回答

淡お忘

2楼-- · 2019-09-12 02:29

You have the concept on its head. You need to pass parLapply the list of files and then work on them. The anonymous function should do the entire process of processing the individual file and returning the desired result.

My suggestion would be to first make this work using regular lapply or sapply and only then power up parallel backend, export all necessary libraries and objects you may need.

parLapply(cl, X = files, FUN = function(x, ...) {
  ... code for processing the file
})

0人赞添加讨论(0) 举报

How to do parallel in R?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间