I have a pickling problem. I want to serialize a function in my main script, then load it and run it in another script. To demonstrate this, I've made 2 scripts:
Attempt 1: The naive way:
dill_pickle_script_1.py
import pickle
import time
def my_func(a, b):
time.sleep(0.1) # The purpose of this will become evident at the end
return a+b
if __name__ == '__main__':
with open('testfile.pkl', 'wb') as f:
pickle.dump(my_func, f)
dill_pickle_script_2.py
import pickle
if __name__ == '__main__':
with open('testfile.pkl') as f:
func = pickle.load(f)
assert func(1, 2)==3
Problem: when I run script 2, I get AttributeError: 'module' object has no attribute 'my_func'
. I understand why: because when my_func is serialized in script1, it belongs to the __main__
module. dill_pickle_script_2 can't know that __main__
there referred to the namespace of dill_pickle_script_1, and therefore cannot find the reference.
Attempt 2: Inserting an absolute import
I fix the problem by adding a little hack - I add an absolute import to my_func in dill_pickle_script_1 before pickling it.
dill_pickle_script_1.py
import pickle
import time
def my_func(a, b):
time.sleep(0.1)
return a+b
if __name__ == '__main__':
from dill_pickle_script_1 import my_func # Added absolute import
with open('testfile.pkl', 'wb') as f:
pickle.dump(my_func, f)
Now it works! However, I'd like to avoid having to do this hack every time I want to do this. (Also, I want to have my pickling be done inside some other module which wouldn't have know which module that my_func came from).
Attempt 3: Dill
I head that the package dill lets you serialize things in main and load them elsewhere. So I tried that:
dill_pickle_script_1.py
import dill
import time
def my_func(a, b):
time.sleep(0.1)
return a+b
if __name__ == '__main__':
with open('testfile.pkl', 'wb') as f:
dill.dump(my_func, f)
dill_pickle_script_2.py
import dill
if __name__ == '__main__':
with open('testfile.pkl') as f:
func = dill.load(f)
assert func(1, 2)==3
Now, however, I have another problem: When running dill_pickle_script_2.py
, I get a NameError: global name 'time' is not defined
. It seems that dill did not realize that my_func referenced the time
module and has to import it on load.
My Question?
How can I serialize an object in main, and load it again in another script so that all the imports used by that object are also loaded, without doing the nasty little hack in Attempt 2?
Well, I found a solution. It is a horrible but tidy kludge and not guaranteed to work in all cases. Any suggestions for improvement are welcome. The solution involves replacing the main reference with an absolute module reference in the pickle string, using the following helper functions:
Now, I can simply change
dill_pickle_script_1.py
to sayAnd then
dill_pickle_script_2.py
works!I ran into the same problem and finally found a solution which works for me.
The following code dynamically re-imports the passed object from the main module if that object refers to
"__main__"
(i.e., was defined in the main module).The other posted answer didn't work for me, since the
dumps
method didn't return an ascii string for me (thus,__main__
couldn't be replaced).