I want to run a simple code in the IJulia notebook which uses the python library numpy. I call numpy with PyCall:
using PyCall
@pyimport numpy as np
This works fine. I then want to split this up to several processors. I add the processors:
addprocs(4)
Then, I run N/proc iterations for a function f, where proc is my number of processors. I split the load evenly between the four processors on my computer:
n=round(Int,N/proc);
proc_sum = @parallel (+) for i=1:proc
f(n)
end
return proc_sum / proc
Without numpy, this works fine. However, when I try to split the code with numpy to different processors, I get the error
ERROR (unhandled task failure): On worker 3:
UndefVarError: np not defined
Is there any way to have numpy work on the other processors? Note that I have Julia 0.5.2, and I have Canopy. I know there have been issues reported before with PyCall and Canopy, but I would greatly prefer keeping Canopy on my machine.
To further expand on what has been said already, everything you need should be loaded in all the processes. E. g. :
addprocs(4)
@everywhere using PyCall
@everywhere @pyimport numpy as np
What you wrote errored because all processes tried to use @pyimport
but only the main process had PyCall
loaded. If you require many packages to do your computations, maybe the easier is to do all the loading in one script, i.e. load_modules.jl
and then simply run
addprocs(4)
@everywhere include("load_modules.jl")
EDIT: It seems that using
is not very robust with @everywhere
(fixed on Julia 0.6, see here). What seems to work better is:
addprocs(4)
import PyCall
@everywhere using PyCall
@everywhere @pyimport numpy as np
I'm expanding on Lyndon's comment to provide a more comprehensive answer.
As per the documentation, processes are independent, and thus rely on their own, independent workspace. Therefore, any functions, modules or variables that will be needed by a process need to be made available to that process first.
If you want to make something available to all existing processes, you can use the @everywhere
macro; clearly, to make something available to "all existing processes", these processes need to have been created first.
So:
addprocs(4); # create 4 extra processes (i.e. workers); this is in addition
# to the main process that handles the REPL
@everywhere import Pycall
@everywhere PyCall.@pyimport numpy as np # load module on _all_ 5 active processes