As a follow up to this question: Is there an easy way to pickle a python function (or otherwise serialize its code)?
I would like to see an example of this bullet from the above post:
"If the function references globals (including imported modules, other functions etc) that you need to pick up, you'll need to serialise these too, or recreate them on the remote side. My example just gives it the remote process's global namespace."
I have a simple test going where I am writing a functions byte code to a file using marshal:
def g(self,blah):
print blah
def f(self):
for i in range(1,5):
print 'some function f'
g('some string used by g')
data = marshal.dumps(f.func_code)
file = open('/tmp/f2.txt', 'w')
file.write(data)
Then starting a fresh python instance I do:
file = open('/tmp/f2.txt', 'r')
code = marshal.loads(file.read())
func2 = types.FunctionType(code, globals(), "some_func_name");
func2('blah')
This results in a:
NameError: global name 'g' is not defined
This is independent of the different approaches I have made to including g. I have tried basically the same approach to sending g over as f but f can still not see g. How do I get g into the global namespace so that it can be used by f in the receiving process?
Someone also recommended looking at pyro as an example of how to do this. I have already made an attempt at trying to understand the related code in the disco project. I took their dPickle class and tried to recreate their disco/tests/test_pickle.py functionality in a standalone app without success. My experiment had problems doing the function marshaling with the dumps call. Anyway, maybe a pyro exploration is next.
In summary, the basic functionality I am after is being able to send a method over the wire and have all the basic "workspace" methods sent over with it (like g).
Example with changes from answer:
Working function_writer:
import marshal, types
def g(blah):
print blah
def f():
for i in range(1,5):
print 'some function f'
g('blah string used by g')
f_data = marshal.dumps(f.func_code)
g_data = marshal.dumps(g.func_code);
f_file = open('/tmp/f.txt', 'w')
f_file.write(f_data)
g_file = open('/tmp/g.txt', 'w')
g_file.write(g_data)
Working function_reader:
import marshal, types
f_file = open('/tmp/f.txt', 'r')
g_file = open('/tmp/g.txt', 'r')
f_code = marshal.loads(f_file.read())
g_code = marshal.loads(g_file.read())
f = types.FunctionType(f_code, globals(), 'f');
g = types.FunctionType(g_code, globals(), 'g');
f()
The cloud package does this -- just 'pip install cloud' and then:
In other words, just call cloudpickle.dump() or cloudpickle.dumps() the same way you'd use pickle.*, then later use the native pickle.load() or pickle.loads() to thaw.
Picloud released the 'cloud' python package under the LGPL, and other open-source projects are already using it (google for "cloudpickle.py" to see a few). The documentation at picloud.com gives you an idea how powerful this code is, and why they had an incentive to put the effort into making general-purpose code pickling work -- their whole business is built around it. The idea is that if you have cpu_intensive_function() and want to run it on Amazon's EC2 grid, you just replace:
with:
The latter uses cloudpickle to pickle up any dependent code and data, ships it to EC2, runs it, and returns the results to you when you call cloud.result(). (Picloud bills in millisecond increments, it's cheap as heck, and I use it all the time for monte carlo simulations and financial time series analysis, when I need hundreds of CPU cores for just a few seconds each. I can't say enough good things about it and I don't even work there.)
Assign it to the global name
g
. (I see you are assigningf
tofunc2
rather than tof
. If you are doing something like that withg
, then it is clear whyf
can't findg
. Remember that name resolution happens at runtime --g
isn't looked up until you callf
.)Of course, I'm guessing since you didn't show the code you're using to do this.
It might be best to create a separate dictionary to use for the global namespace for the functions you're unpickling -- a sandbox. That way all their global variables will be separate from the module you're doing this in. So you might do something like this:
In this example I assume that you've put the code objects from all your functions in one file, one after the other, and when reading them in, I get the code object's name and use it as the basis for both the function object's name and the name under which it's stored in the sandbox dictionary.
Inside the unpickled functions, the sandbox dictionary is their
globals()
and so insidef()
,g
gets its value fromsandbox["g"]
. To callf
then would be:sandbox["f"]("blah")
Every module has its own globals, there are no universal globals. We can "implant" restored functions into some module and use this like a normal module.
-- save --
-- restore --
Edited:
You can do also import some module .e.g. "sys" to "sandbox" namespace from outside:
or the same:
Your original code would work if you do it not in ipython interactive but in a python program or normal python interactive!!!
Ipython uses some strange namespace that is not a dict of any module from sys.modules. Normal python or any main program use
sys.modules['__main__'].__dict__
as globals(). Any module usesthat_module.__dict__
which is also OK, only ipython interactive is a problem.Dill (along with other pickle variants, cloudpickle, etc.) seem to work when the function(s) being pickled are in the main module along with the pickling. If you are pickling a function from another module, that module name has to be present when the unpickling happens. I cannot seem to find a way around this limitation.
You can get a better handle on global objects by importing
__main__
, and using the methods available in that module. This is what dill does in order to serialize almost anything in python. Basically, when dill serializes an interactively defined function, it uses some name mangling on__main__
on both the serialization and deserialization side that makes__main__
a valid module.Actually, dill registers it's types into the
pickle
registry, so if you have some black box code that usespickle
and you can't really edit it, then just importing dill can magically make it work without monkeypatching the 3rd party code.Or, if you want the whole interpreter session sent over as an "python image", dill can do that too.
You can easily send the image across ssh to another computer, and start where you left off there as long as there's version compatibility of pickle and the usual caveats about python changing and things being installed.