One way of parallelization in R is through the snowfall package. To send custom functions to workers you can use sfExport()
(see Joris' post here).
I have a custom function that depends on functions from non-base packages that are not loaded automagically. Thus, when I run my function in parallel, R craps out because certain functions are not available (think of the packages spatstat, splancs, sp...). So far I've solved this by calling library() in my custom function. This loads the packages on the first run and possibly just ignores on subsequent iterations. Still, I was wondering if there's another way of telling each worker to load the package on first iteration and be done with it (Or am I missing something and each iteration starts as a tabula rasa?).
I don't understand the question.
Packages are loaded via
library()
, and most of the parallel execution functions support that. For example, the snow package usesto 'quietly' (ie not return value) evaluate the given expression---here a call to
library()
---on each node. Most of the parallel execution frameworks have something like that.Why again would you need something different, and what exactly does not work here?
There's a specific command for that in snowfall,
sfLibrary()
. See also ?"snowfall-tools". Calling library manually on every node is strongly discouraged. sfLibrary is basically a wrapper around the solution Dirk gave based on the snow package.