R assigning function call to two different cores

2020-08-01 05:55发布

问题:

So far, all I've read about parallel processing in R involves looking at multiple rows of one dataframe.

But what if I have 2 or three large dataframes that I want to perform a long function on? Can I assign each instance of the function to a specific core so I don't have to wait for it to work sequentially? I'm on windows.

Lets say this is the function:

AltAlleleRecounter <- function(names,data){
data$AC <- 0
numalleles <- numeric(length=nrow(data))
for(i in names){
    genotype <- str_extract(data[,i],"^[^/]/[^/]")
    GT <- dstrfw(genotype,c('character','character','character'),c(1L,1L,1L))
    data[GT$V1!='.',]$AC <- data[GT$V1!='.',]$AC+GT[GT$V1!='.',]$V1+GT[GT$V1!='.',]$V3
    numalleles[GT$V1!='.'] <- numalleles[GT$V1!='.'] + 2
}
data$AF <- data$AC/numalleles
return(data)
}

What I want to do is basically this (generic psuedocode):

wait_till_everything_is_finished(
core1="data1 <- AltAlleleRecounter(sampleset1,data1,1)",
core2="data2 <- AltAlleleRecounter(sampleset2,data2,2)",
core3="data3 <- AltAlleleRecounter(sampleset3,data3,3)"
)

where all three commands are running but the program doesn't progress until everything is done.

Edit: Bryan's suggestion worked. I replaced "otherList" with my second list. This is example code:

myframelist <- list(data1,data2)
mynameslist <- list(names1,names2)
myframelist <- foreach(i=1:2) %dopar% (AltAlleleRecounter(mynameslist[[i]],myframelist[[i]]))
myfilenamelist <- list("data1.tsv","data2.tsv")
foreach(i=1:2) %dopar% (write.table(myframelist[[i]], file=myfilenamelist[[i]], quote=FALSE, sep="\t", row.names=FALSE, col.names=TRUE))

The data variables are dataframes and the name variables are just character vectors. You may need to reload some packages.

回答1:

Try something like this:

library(doParallel)
library(foreach)

cl<-makeCluster(6) ## you can set up as many cores as you need/want/have here. 
registerDoParallel(cl)
getDoParWorkers() # should be the number you registered. If not, something went wrong.

df1<-data.frame(matrix(1:9, ncol = 3))
df2<-data.frame(matrix(1:9, ncol = 3))
df3<-data.frame(matrix(1:9, ncol = 3))
mylist<-list(df1, df2, df3)

otherList<-list(1, 2, 3)

mylist<-foreach(i=1:3) %dopar% (mylist[[i]] * otherList[[i]])
mylist

[[1]]
X1 X2 X3
1  4  7
2  5  8
3  6  9

[[2]]
X1 X2 X3
2  8 14
4 10 16
6 12 18

[[3]]
X1 X2 X3
3 12 21
6 15 24
9 18 27

I do this fairly often with topic modeling different databases. The idea is to create lists of the data you want to apply your function to, then have foreach apply your function to those indexed lists in parallel. For your example you will have to make a list of your data.frames and another list of your samplesets.