I'm working with a custom random forest function that requires both a starting and ending point in a set of genomic data (about 56k columns).
I'd like to split the column numbers into subgroups and allow each subgroup to be processed individually to speed things up. I tried this (unsuccessfully) with the following code:
library(foreach)
library(doMC)
foreach(startMrk=(markers$start), endMrk=(markers$end)) %dopar%
rfFunction(genoA,genoB,0.8,ntree=100,startMrk=startMrk,endMrk=endMrk)
Where startMrk is an array of numeric variables: 1 4 8 12 16
and endMrk is another array: 3 7 11 15 19
For this example, I'd want one core to run samples 1:3, another to run 4:7, etc. I'm new to the idea of parallel processing in R, so I'm more than willing to study any documentation available. Does anyone have advice on things I'm missing for parallel-wise processing or for the above code?