-->

Calling numpy on parallel processors in IJulia not

2019-08-17 14:37发布

问题:

I want to run a simple code in the IJulia notebook which uses the python library numpy. I call numpy with PyCall:

using PyCall

@pyimport numpy as np

This works fine. I then want to split this up to several processors. I add the processors:

addprocs(4)

Then, I run N/proc iterations for a function f, where proc is my number of processors. I split the load evenly between the four processors on my computer:

n=round(Int,N/proc);

proc_sum = @parallel (+) for i=1:proc

        f(n)

end

return proc_sum / proc

Without numpy, this works fine. However, when I try to split the code with numpy to different processors, I get the error

ERROR (unhandled task failure): On worker 3:

UndefVarError: np not defined

Is there any way to have numpy work on the other processors? Note that I have Julia 0.5.2, and I have Canopy. I know there have been issues reported before with PyCall and Canopy, but I would greatly prefer keeping Canopy on my machine.

回答1:

To further expand on what has been said already, everything you need should be loaded in all the processes. E. g. :

addprocs(4) @everywhere using PyCall @everywhere @pyimport numpy as np

What you wrote errored because all processes tried to use @pyimport but only the main process had PyCall loaded. If you require many packages to do your computations, maybe the easier is to do all the loading in one script, i.e. load_modules.jl and then simply run

addprocs(4) @everywhere include("load_modules.jl")

EDIT: It seems that using is not very robust with @everywhere (fixed on Julia 0.6, see here). What seems to work better is:

addprocs(4) import PyCall @everywhere using PyCall @everywhere @pyimport numpy as np



回答2:

I'm expanding on Lyndon's comment to provide a more comprehensive answer.

As per the documentation, processes are independent, and thus rely on their own, independent workspace. Therefore, any functions, modules or variables that will be needed by a process need to be made available to that process first.

If you want to make something available to all existing processes, you can use the @everywhere macro; clearly, to make something available to "all existing processes", these processes need to have been created first.

So:

addprocs(4); # create 4 extra processes (i.e. workers); this is in addition
             # to the main process that handles the REPL

@everywhere import Pycall
@everywhere PyCall.@pyimport numpy as np # load module on _all_ 5 active processes