I have tried multiple approaches to pickle a python function with dependencies, following many recommendations on StackOverflow, (such as dill, cloudpickle, etc.) but all seem to run into a fundamental issue that I cannot figure out.
I have a main module that tries to pickle a function from an imported module, sends it over ssh to be unpickled and executed at a remote machine.
So main has:
import dill (for example)
import modulea
serial=dill.dumps( modulea.func )
send (serial)
On the remote machine:
import dill
receive serial
funcremote = dill.loads( serial )
funcremote()
If the functions being pickled and sent are top level functions defined in main itself, everything works. When they are in an imported module, the loads function fails with messages of the type "module modulea not found".
It appears that the module name is pickled along with the function name. I do not see any way to "fix up" the pickle to remove the dependency, or alternately, to create a dummy module in the receiver to become the recipient of the unpickling.
Any pointers will be much appreciated.
--prasanna
I'm the
dill
author. I do this exact thing overssh
, but with success. Currently,dill
and any of the other serializers pickle modules by reference… so to successfully pass a function defined in a file, you have to ensure that the relevant module is also installed on the other machine. I do not believe there is any object serializer that serializes modules directly (i.e. not by reference).Having said that,
dill
does have some options to serialize object dependencies. For example, for class instances, the default indill
is to not serialize class instances by reference… so the class definition can also be serialized and send with the instance. Indill
, you can also (use a very new feature to) serialize file handles by serializing the file, instead of the doing so by reference. But again, if you have the case of a function defined in a module, you are out-of-luck, as modules are serialized by reference pretty darn universally.You might be able to use
dill
to do so, however, just not with pickling the object, but with extracting the source and sending the source code. Inpathos.pp
andpyina
,dill
us used to extract the source and the dependencies of any object (including functions), and pass them to another computer/process/etc. However, since this is not an easy thing to do,dill
can also use the failover of trying to extract a relevant import and send that instead of the source code.You can understand, hopefully, this is a messy messy thing to do (as noted in one of the dependencies of the function I am extracting below). However, what you are asking is successfully done in the
pathos
package to pass code and dependencies to different machines across ssh-tunneled ports.I imagine something could also be built around the
dill.detect.parents
method, which provides a list of pointers to all parent object for any given object… and one could reconstruct all of any function's dependencies as objects… but this is not implemented.BTW: to establish a ssh tunnel, just do this:
Then you can work across the local port with
ZMQ
, orssh
, or whatever. If you want to do so withssh
,pathos
also has that built in.