Python pickle not one-to-one: different pickles gi

2019-02-27 15:17发布

问题:

Can someone explain this?

pickle.loads(b'\x80\x03X\x01\x00\x00\x00.q\x00h\x00\x86q\x01.') == pickle.loads(b'\x80\x03X\x01\x00\x00\x00.q\x00X\x01\x00\x00\x00.q\x01\x86q\x02.')
>>>True

pickle.loads(b'\x80\x03X\x01\x00\x00\x00.q\x00h\x00\x86q\x01.')
>>>('.', '.')
pickle.loads(b'\x80\x03X\x01\x00\x00\x00.q\x00X\x01\x00\x00\x00.q\x01\x86q\x02.')
>>>('.', '.')

There seems to be a long and short pickled version of tuples with the same element repeatedly.

Other examples:

pickle.loads(b'\x80\x03X\x01\x00\x00\x00#q\x00X\x01\x00\x00\x00#q\x01\x86q\x02.')
>>>('#', '#')
pickle.loads(b'\x80\x03X\x01\x00\x00\x00#q\x00h\x00\x86q\x01.')
>>>('#', '#')

pickle.loads(b'\x80\x03X\x01\x00\x00\x00$q\x00X\x01\x00\x00\x00$q\x01\x86q\x02.')
>>>('$', '$')
pickle.loads(b'\x80\x03X\x01\x00\x00\x00$q\x00h\x00\x86q\x01.')
>>>('$', '$')

I'm trying to index items by their pickle but I'm not finding the items because their pickles seem to be changing.

I'm using Python 3.3.2 on Ubuntu.

回答1:

Pickles aren't unique; the pickle format is actually a tiny little programming language, and different programs (pickles) can produce the same output (unpickled object). From the docs:

Since the pickle data format is actually a tiny stack-oriented programming language, and some freedom is taken in the encodings of certain objects, it is possible that the two modules [pickle and cPickle] produce different data streams for the same input objects. However it is guaranteed that they will always be able to read each other’s data streams.

There's even a pickletools.optimize function that will take a pickle and output a better pickle. You're going to need to redesign your program.