I have a code framework which involves dumping sessions with dill. This used to work just fine, until I started to use pandas. The following code raises a PicklingError on CentOS release 6.5:
import pandas
import dill
dill.dump_session('x.dat')
The problem seems to stem from pandas.algos. In fact, it's enough to run this to reproduce the error:
import pandas.algos
import dill
dill.dump_session('x.dat') / dill.dumps(pandas.algos)
The error is pickle.PicklingError: Can't pickle <cyfunction lambda1 at 0x1df3050>: it's not found as pandas.algos.lambda1
.
The thing is, this error is not raised on my pc. Both of them have same versions of pandas (0.14.1), dill (0.2.1), and python (2.7.6).
Looking on the badobjects, I get:
>>> dill.detect.badobjects(pandas.algos, depth = 1)
{'__builtins__': <module '__builtin__' (built-in)>,
'_return_true': <cyfunction lambda2 at 0x1484d70>,
'np': <module 'numpy' from '/usr/local/lib/python2.7/site-packages/numpy-1.8.2-py2.7-linux-x86_64.egg/numpy/__init__.pyc'>,
'_return_false': <cyfunction lambda1 at 0x1484cc8>,
'lib': <module 'pandas.lib' from '/home/talkr/.local/lib/python2.7/site-packages/pandas/lib.so'>}
This seems to be due to different handling of pandas.algos
by the two OS-s (perhaps different compilers?). On my PC, where dump_session
is without errors, pandas.algos._return_false
is <cyfunction <lambda> at 0x06DD02A0>
, while on CentOS it's <cyfunction lambda1 at 0x1df3050>
. Why is it handled differently?
I'm not seeing what you are seeing on a mac. Here's what I see, using the same version of
pandas
. I do see that you are using a different version ofdill
. I'm using the version from github. I'll check if there was a tweak to saving modules or globals indill
that might have had that impact on some distros.Here is what I get for
pandas.algos
,Here's what I get for
pandas.algos._return_false
:So, I can now reproduce your error.
This looks like an unpicklable object, based on how it's built. However, it should be able to be pickled inside the module… as it is for me. You seem to have pinpointed the difference between what you are seeing in the object pandas builds on CentOS.
Looking at the
pandas
codebase,pandas.algos
is apyx
file… so that'scython
. And here's the code.Were that in a
.py
file, I know it would serialize. I have no idea howdill
works forcython
generated lambdas… (e.g. a lambdacyfunction
).It looks like there was a commit (https://github.com/pydata/pandas/commit/73c71dfca10012e25c829930508b5d6f7ccad5ff) in which
_return_false
was moved outside a class into the module scope. Do you see that on both CentOS and your PC? It may be that the v0.14.1 for different distros was cut off slightly different git versions… depending on how you installed pandas.So apparently, I can pick up a
lambda1
by trying to get the source of the object… which for lambda, if it can't get the source,dill
will grab by name… and apparently it's namedlambda1
… even though that doesn't show up in the .pyx file. Maybe it's due to howcython
builds the lambdas.The difference might be coming from
cython
… since the code is generated from a.pyx
inpandas
. What's your versions ofcython
? Mine is 0.20.2.