Understanding Pickling in Python

2019-01-16 14:05发布

I have recently got an assignment where I need to put a dictionary (where each key refers to a list) in pickled form. The only problem is I have no idea what pickled form is. Could anyone point me in the right direction of some good resources to help me learn this concept? Thanks!

7条回答
再贱就再见
2楼-- · 2019-01-16 14:24

While others have pointed to the Python documentation on the pickle module, which is a great resource, you can also check out Chapter 13: Serializing Python Objects of Dive Into Python 3 by Mark Pilgrim.

查看更多
祖国的老花朵
3楼-- · 2019-01-16 14:24

Pickling is a mini-language that can be used to convert the relevant state from a python object into a string, where this string uniquely represents the object. Then (un)pickling can be used to convert the string to a live object, by "reconstructing" the object from the saved state founding the string.

>>> import pickle
>>> 
>>> class Foo(object):
...   y = 1
...   def __init__(self, x):
...     self.x = x
...     return
...   def bar(self, y):
...     return self.x + y
...   def baz(self, y):
...     Foo.y = y  
...     return self.bar(y)
... 
>>> f = Foo(2)
>>> f.baz(3)
5
>>> f.y
3
>>> pickle.dumps(f)
"ccopy_reg\n_reconstructor\np0\n(c__main__\nFoo\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n(dp5\nS'x'\np6\nI2\nsb."

What you can see here is that pickle doesn't save the source code for the class, but does store a reference to the class definition. Basically, you can almost read the picked string… it says (roughly translated) "call copy_reg's reconstructor where the arguments are the class defined by __main__.Foo and then do other stuff". The other stuff is the saved state of the instance. If you look deeper, you can extract that "string x" is set to "the integer 2" (roughly: S'x'\np6\nI2). This is actually a clipped part of the pickled string for a dictionary entry… the dict being f.__dict__, which is {'x': 2}. If you look at the source code for pickle, it very clearly gives a translation for each type of object and operation from python to pickled byte code.

Note also that there are different variants of the pickling language. The default is protocol 0, which is more human-readable. There's also protocol 2, shown below (and 1,3, and 4, depending on the version of python you are using).

>>> pickle.dumps([1,2,3])
'(lp0\nI1\naI2\naI3\na.'
>>> 
>>> pickle.dumps([1,2,3], -1)
'\x80\x02]q\x00(K\x01K\x02K\x03e.'

Again, it's still a dialect of the pickling language, and you can see that the protocol 0 string says "get a list, include I1, I2, I3", while the protocol 2 is harder to read, but says the same thing. The first bit \x80\x02 indicates that it's protocol 2 -- then you have ] which says it's a list, then again you can see the integers 1,2,3 in there. Again, check the source code for pickle to see the exact mapping for the pickling language.

To reverse the pickling to a string, use load/loads.

>>> p = pickle.dumps([1,2,3])
>>> pickle.loads(p)
[1, 2, 3]
查看更多
叼着烟拽天下
4楼-- · 2019-01-16 14:25

Pickling is just serialization: putting data into a form that can be stored in a file and retrieved later. Here are the docs on the pickle module:

http://docs.python.org/release/2.7/library/pickle.html

查看更多
ゆ 、 Hurt°
5楼-- · 2019-01-16 14:25

The pickle module implements a fundamental, but powerful algorithm for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as “serialization”, “marshaling,” or “flattening”, however, to avoid confusion, the terms used here are “pickling” and “unpickling”.

The pickle module has an optimized cousin called the cPickle module. As its name implies, cPickle is written in C, so it can be up to 1000 times faster than pickle. However, it does not support subclassing of the Pickler() and Unpickler() classes, because in cPickle these are functions, not classes. Most applications have no need for this functionality and can benefit from the improved performance of cPickle. Other than that, the interfaces of the two modules are nearly identical; the common interface is described in this manual and differences are pointed out where necessary. In the following discussions, we use the term “pickle” to collectively describe the pickle and cPickle modules.

The data streams the two modules produce are guaranteed to be interchangeable.

查看更多
倾城 Initia
6楼-- · 2019-01-16 14:40

Pickling in Python is used to serialize and de-serialize Python objects, like dictionary in your case. I usually use cPickle module as it can be much faster than the Pickle module.

import cPickle as pickle    

def serializeObject(pythonObj):
    return pickle.dumps(pythonObj, pickle.HIGHEST_PROTOCOL)

def deSerializeObject(pickledObj):
    return pickle.loads(pickledObj)
查看更多
Juvenile、少年°
7楼-- · 2019-01-16 14:41

The pickle module implements a fundamental, but powerful algorithm for serializing and de-serializing a Python object structure.

Pickling - is the process whereby a Python object hierarchy is converted into a byte stream, and Unpickling - is the inverse operation, whereby a byte stream is converted back into an object hierarchy.

Pickling (and unpickling) is alternatively known as serialization, marshalling, or flattening.

import pickle

data1 = {'a': [1, 2.0, 3, 4+6j],
         'b': ('string', u'Unicode string'),
         'c': None}

selfref_list = [1, 2, 3]
selfref_list.append(selfref_list)

output = open('data.pkl', 'wb')

# Pickle dictionary using protocol 0.
pickle.dump(data1, output)

# Pickle the list using the highest protocol available.
pickle.dump(selfref_list, output, -1)

output.close()

To read from a pickled file -

import pprint, pickle

pkl_file = open('data.pkl', 'rb')

data1 = pickle.load(pkl_file)
pprint.pprint(data1)

data2 = pickle.load(pkl_file)
pprint.pprint(data2)

pkl_file.close()

source - https://docs.python.org/2/library/pickle.html

查看更多
登录 后发表回答