I'm reading csv file in directory with more than 100 files, then I'm doing some stuff, I have 8 cores cpu so I want to do in parallel mode to finish faster.
I wrote some code but it doesn't work for me - (using linux)
library(data.table)
library(parallel)
# Calculate the number of cores
no_cores <- detectCores() - 1
# Initiate cluster
cl <- makeCluster(no_cores)
processFile <- function(f) {
# reading file by data.table
df <- fread(f,colClasses = c(NA,NA, NA,"NULL", "NULL", "NULL"))
A <- parLapply(cl,sapply(windows, function(w) {return(numOverlaps(w,df))}))
stopCluster(cl)
}
files <- dir("/home/shared/", recursive=TRUE, full.names=TRUE, pattern=".*\\.txt$")
# Apply the function to all files.
result <- sapply(files, processFile)
As you see I want to run function in processFile(A) but it doesn't work!
How it's possible to run that function in parallel processing mode?
You have the concept on its head. You need to pass
parLapply
the list of files and then work on them. The anonymous function should do the entire process of processing the individual file and returning the desired result.My suggestion would be to first make this work using regular
lapply
orsapply
and only then power up parallel backend, export all necessary libraries and objects you may need.