I'm using nested foreach from the doSMP package to generate results based on a function I developed. Ordinarily the problem would use three nested loops, but due to the size of results generated (around 80,000 for each i), I've had to pause compilation and write the results to file when the final results matrix exceeds a specified number of rows.
i = 1
write.off = 1
while(i <= length(i.vector)){
results.frame = as.data.frame(matrix(NA, ncol = 3, nrow = 1))
while(nrow(results.frame) < 500000 & i <= length(i.vector)){
results = foreach(j = 1:length(j.vector), .combine = "rbind", .inorder = TRUE) %:%
foreach(k = 1:length(k.vector), .combine = "rbind", .inorder = TRUE) %dopar%{
ith.value = i.vector[i]
jth.value = j.vector[j]
kth.value = k.vector[k]
my.function(ith.value, jth.value, kth.value)
}
results.frame = rbind(results.frame, results)
i = i + 1
}
results.frame = results.frame[-1,]
write.table(results.frame, paste("part_",write.off, sep = ""))
write.off = write.off + 1
}
The problem I'm having is with garbage collection. The workers don't seem to reallocate memory back to the system, so by i = 4 they each have eaten up around 6GB of memory.
I've tried inserting gc() into the foreach loop directly as well as into the underlying function, and I've also tried assigning the function and its results to a named environment that I can clear periodically. None of these methods have worked.
I feel like foreach's initEnvir and finalEnvir parameters might offer a solution, but the documentation and examples haven't really shed much light on this.
I'm running this code on a VM operating Windows Server 2008.