I am using "import threading" and python 3.4. Simple case, I have one main parent thread and one child thread. I need to save my dict to file from child thread. In thread function I have variable:
def thread_function(...)
def save_to_file():
this_thread_data.my_dict or nonlocal this_thread_data.my_dict
... json or pickle
this_thread_data = local()
this_thread_data.my_dict = {...}
...
When I use pickle I get error
_pickle.PicklingError: Can't pickle <class '_thread.lock'>: attribute lookup lock on _thread failed
When I use json I get error
TypeError: <threading.Event object at 0x7f49115a9588> is not JSON serializable
Will pickle or json work in multithreading environment or I need to use something else instead?
Thank you.
Python threading (and multiprocessing) and pickling is broken and limited unless you jump outside the standard library.
If you use a fork of
multiprocessing
calledpathos.multiprocesssing
, you can directly use classes and class methods in multiprocessing'smap
functions. This is becausedill
is used instead ofpickle
orcPickle
, anddill
can serialize almost anything in python.pathos.multiprocessing
provides an interface to the threading module, just like the standard python module does.pathos.multiprocessing
also provides an asynchronous map function… and it canmap
functions with multiple arguments (e.g.map(math.pow, [1,2,3], [4,5,6])
)See: What can multiprocessing and dill do together?
and: http://matthewrocklin.com/blog/work/2013/12/05/Parallelism-and-Serialization/
Having unusual stuff in a dict, doesn't matter…
By the way, if you wanted to pickle a thread lock, you can do that too.
It looks like you wanted to build some sort of closure that would automatically store function calls to a file or at least to a serialized string. If that's what you want, you could try
klepto
, which gives you a decorator that you apply to your function and you get caching to memory or disk or to a database.Klepto
can use pickle or json, but it's augmented bydill
, so it can serialize almost anything in python -- so don't worry about what's in your dict… just serialize it.Klepto
enables you to have all your cached results available when you restart your code. In that case, you'd pick some file or database backend, then ensure you do aadd.dump()
to the archive… then restart python or whatever, and doadd.load()
to load the archived results.Get the code here: https://github.com/uqfoundation
Using pickle and json will work fine in a multi-threaded environment (but probably is not thread-safe so make sure the data you're pickling can't changing at the time, for example by using a lock). The catch is that you will be restricted to the kind of data you can actually save to disk.
Not all objects are serialisable, as you have found. The simplest approach is to make sure your dictionary only has values that are compatible with pickle or the json serialiser. For example, you seem to have stored a lock object in your dictionary that is making pickle fail. You might want to create a new dictionary with only the values that can be pickled, and then pickle that.
Alternatively, if you want to create a custom object to store your data, you can tell pickle exactly how to pickle it. This is more advanced and probably unnecessary in your case, but you can find more documentation here: https://docs.python.org/3.4/library/pickle.html#pickling-class-instances
There are better ways to share data between threads. If you're open to using processes instead of threads, I would recommend the python 'multiprocessing' module, specifically the 'Manager' class: https://docs.python.org/2/library/multiprocessing.html#managers. Here is a toy example:
prints [1,2,3]