So far, all I've read about parallel processing in R involves looking at multiple rows of one dataframe.
But what if I have 2 or three large dataframes that I want to perform a long function on? Can I assign each instance of the function to a specific core so I don't have to wait for it to work sequentially? I'm on windows.
Lets say this is the function:
AltAlleleRecounter <- function(names,data){
data$AC <- 0
numalleles <- numeric(length=nrow(data))
for(i in names){
genotype <- str_extract(data[,i],"^[^/]/[^/]")
GT <- dstrfw(genotype,c('character','character','character'),c(1L,1L,1L))
data[GT$V1!='.',]$AC <- data[GT$V1!='.',]$AC+GT[GT$V1!='.',]$V1+GT[GT$V1!='.',]$V3
numalleles[GT$V1!='.'] <- numalleles[GT$V1!='.'] + 2
}
data$AF <- data$AC/numalleles
return(data)
}
What I want to do is basically this (generic psuedocode):
wait_till_everything_is_finished(
core1="data1 <- AltAlleleRecounter(sampleset1,data1,1)",
core2="data2 <- AltAlleleRecounter(sampleset2,data2,2)",
core3="data3 <- AltAlleleRecounter(sampleset3,data3,3)"
)
where all three commands are running but the program doesn't progress until everything is done.
Edit: Bryan's suggestion worked. I replaced "otherList" with my second list. This is example code:
myframelist <- list(data1,data2)
mynameslist <- list(names1,names2)
myframelist <- foreach(i=1:2) %dopar% (AltAlleleRecounter(mynameslist[[i]],myframelist[[i]]))
myfilenamelist <- list("data1.tsv","data2.tsv")
foreach(i=1:2) %dopar% (write.table(myframelist[[i]], file=myfilenamelist[[i]], quote=FALSE, sep="\t", row.names=FALSE, col.names=TRUE))
The data variables are dataframes and the name variables are just character vectors. You may need to reload some packages.