is it possible to get the R survey package's `

2019-07-07 08:28发布

问题:

being able to multithread on windows would be awesome, but perhaps this problem is harder than i had thought.. :(

inside of survey:::svyby.default there is a a block that's either lapply or mclapply depending on multicore=TRUE and your operating system. windows users get forced into the lapply loop no matter what, and i was wondering if there's any way to go down the mclapply path instead.. speeding up the computation.

i don't know too much about the innards of parallel processing, but i did some experiments to see if any of the windows-acceptable alternatives would work. first i tried overwriting mclapply with

mclapply <- 
    function( X , FUN , ... ){ 
        clusterApply( 
            x = X , 
            fun = FUN , 
            cl = makeCluster( detectCores() ) , ... ) 
    }

next i used fixInNamespace( svyby.default , "survey" ) to remove the line if (multicore) parallel:::closeAll()

but that only got me to the point where

> svyby(~api99, ~stype, dclus1, svymean , multicore=TRUE )
Error in checkForRemoteErrors(val) :
  3 nodes produced errors; first error: object 'svymean' not found

回答1:

quoting Dr. Thomas Lumley, author of the R survey package in response to my inquiry--

No. This approach to parallelising relies on forking, which Windows doesn't support.

It would be necessary to rewrite it to use clusterApply(), and I'm pretty sure the communications overhead would eat the speed gain. With forking, the child process gets a copy of the parent process data for free -- it's all done by the virtual<->physical memory translation hardware -- but with the cluster approach R has to send data to the child process explicitly.