I have a pickling problem. I want to serialize a function in my main script, then load it and run it in another script. To demonstrate this, I've made 2 scripts:
Attempt 1: The naive way:
dill_pickle_script_1.py
import pickle
import time
def my_func(a, b):
time.sleep(0.1) # The purpose of this will become evident at the end
return a+b
if __name__ == '__main__':
with open('testfile.pkl', 'wb') as f:
pickle.dump(my_func, f)
dill_pickle_script_2.py
import pickle
if __name__ == '__main__':
with open('testfile.pkl') as f:
func = pickle.load(f)
assert func(1, 2)==3
Problem: when I run script 2, I get AttributeError: 'module' object has no attribute 'my_func'
. I understand why: because when my_func is serialized in script1, it belongs to the __main__
module. dill_pickle_script_2 can't know that __main__
there referred to the namespace of dill_pickle_script_1, and therefore cannot find the reference.
Attempt 2: Inserting an absolute import
I fix the problem by adding a little hack - I add an absolute import to my_func in dill_pickle_script_1 before pickling it.
dill_pickle_script_1.py
import pickle
import time
def my_func(a, b):
time.sleep(0.1)
return a+b
if __name__ == '__main__':
from dill_pickle_script_1 import my_func # Added absolute import
with open('testfile.pkl', 'wb') as f:
pickle.dump(my_func, f)
Now it works! However, I'd like to avoid having to do this hack every time I want to do this. (Also, I want to have my pickling be done inside some other module which wouldn't have know which module that my_func came from).
Attempt 3: Dill
I head that the package dill lets you serialize things in main and load them elsewhere. So I tried that:
dill_pickle_script_1.py
import dill
import time
def my_func(a, b):
time.sleep(0.1)
return a+b
if __name__ == '__main__':
with open('testfile.pkl', 'wb') as f:
dill.dump(my_func, f)
dill_pickle_script_2.py
import dill
if __name__ == '__main__':
with open('testfile.pkl') as f:
func = dill.load(f)
assert func(1, 2)==3
Now, however, I have another problem: When running dill_pickle_script_2.py
, I get a NameError: global name 'time' is not defined
. It seems that dill did not realize that my_func referenced the time
module and has to import it on load.
My Question?
How can I serialize an object in main, and load it again in another script so that all the imports used by that object are also loaded, without doing the nasty little hack in Attempt 2?