I need to save to disk a little dict
object whose keys are of the type str
and values are int
s and then recover it. Something like this:
{'juanjo': 2, 'pedro':99, 'other': 333}
What is the best option and why? Serialize it with pickle
or with simplejson
?
I am using Python 2.6.
I prefer JSON over pickle for my serialization. Unpickling can run arbitrary code, and using
pickle
to transfer data between programs or store data between sessions is a security hole. JSON does not introduce a security hole and is standardized, so the data can be accessed by programs in different languages if you ever need to.If you are primarily concerned with speed and space, use cPickle because cPickle is faster than JSON.
If you are more concerned with interoperability, security, and/or human readability, then use JSON.
The tests results referenced in other answers were recorded in 2010, and the updated tests in 2016 with cPickle protocol 2 show:
Reproduce this yourself with this gist, which is based on the Konstantin's benchmark referenced in other answers, but using cPickle with protocol 2 instead of pickle, and using json instead of simplejson (since json is faster than simplejson), e.g.
Results with python 2.7 on a decent 2015 Xeon processor:
Python 3.4 with pickle protocol 3 is even faster.
I have tried several methods and found out that using cPickle with setting the protocol argument of the dumps method as:
cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL)
is the fastest dump method.Output:
If you do not have any interoperability requirements (e.g. you are just going to use the data with Python) and a binary format is fine, go with cPickle which gives you really fast Python object serialization.
If you want interoperability or you want a text format to store your data, go with JSON (or some other appropriate format depending on your constraints).
JSON or pickle? How about JSON and pickle! You can use
jsonpickle
. It easy to use and the file on disk is readable because it's JSON.http://jsonpickle.github.com/
Personally, I generally prefer JSON because the data is human-readable. Definitely, if you need to serialize something that JSON won't take, than use pickle.
But for most data storage, you won't need to serialize anything weird and JSON is much easier and always allows you to pop it open in a text editor and check out the data yourself.
The speed is nice, but for most datasets the difference is negligible; Python generally isn't too fast anyways.