I'm doing a fair amount of parallel processing in Python using the multiprocessing module. I know certain objects CAN be pickle (thus passed as arguments in multi-p) and others can't. E.g.
class abc():
pass
a=abc()
pickle.dumps(a)
'ccopy_reg\n_reconstructor\np1\n(c__main__\nabc\np2\nc__builtin__\nobject\np3\nNtRp4\n.'
But I have some larger classes in my code (a dozen methods, or so), and this happens:
a=myBigClass()
pickle.dumps(a)
Traceback (innermost last):
File "<stdin>", line 1, in <module>
File "/usr/apps/Python279/python-2.7.9-rhel5-x86_64/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle file objects
It's not a file object, but at other times, I'll get other messages that say basically: "I can't pickle this".
So what's the rule? Number of bytes? Depth of hierarchy? Phase of the moon?
In addition to icedtrees' answer, also coming straight from the docs, you can customize and control how class instances are pickled and unpicked, using the special methods:
object.__getnewargs_ex__()
,object.__getnewargs__()
,object.__getstate__()
,object.__setstate__(state)
I'm the
dill
author. There's a fairly comprehensive list of what pickles and what doesn't as part ofdill
. It can be run per version of Python 2.5–3.4, and adjusted for what pickles withdill
or what pickles withpickle
by changing one flag. See here and here.The root of the rules for what pickles is (off the top of my head):
__main__
versus an imported function)? [Then, yes]__getstate__
and__setstate__
rule exist for the given object type? [Then, yes]Frame
object (i.e. rely on the GIL and global execution stack)? Iterators are now an exception to this, by "replaying" the iterator on unpickling. [Then, no]__init__
path manipulations)? [Then, no]So, (5) is less prevalent now than it used to be, but still has some lasting effects in the language for
pickle
.dill
, for the most part, removes (1), (2), and (5) – but is still fairly effected by (3) and (4).I might be forgetting something else, but I think in general those are the underlying rules.
Certain modules like
multiprocessing
register some objects that are important for their functioning.dill
registers the majority of objects in the language.The
dill
fork ofmultiprocessing
is required becausemultiprocessing
usescPickle
, anddill
can only augment the pure-Python pickling registry. You could, if you have the patience, go through all the relevantcopy_reg
functions indill
, and apply them to thecPickle
module and you'd get a much more pickle-capablemultiprocessing
. I've found a simple (read: one liner) way to do this forpickle
, but notcPickle
.The general rule of thumb is that "logical" objects can be pickled, but "resource" objects (files, locks) can't, because it makes no sense to persist/clone them.
From the docs: