I'm working through some scipy lectures (http://scipy-lectures.github.io/intro/language/standard_library.html#pickle-easy-persistence) and I came across this statement about Pickle:
Useful to store arbitrary objects to a file. Not safe or fast!
What do they mean by this? Not safe (according to Pickle docs) as in don't UnPickle files from an unknown origin or not safe as in you don't always retrieve the original object?
What's the alternative for something safer and faster? I know about cPickle being faster, but I don't think it solves the above definition of safer.
Thanks.
Using pickle in production code is vulnerable by design. Arbitrary code can be executed while unpickling. You can safely unpickle only data from trusted sources. Never unpickle data received from an untrusted or unauthenticated source.
See here for real applications samples.
As for faster alternative, there is marshal
, python internal serealization library. But unlike pickle (or cPickle, which is just a C implementation), it is less stable (see docs) and its output being architecture and os independend, depends on python version. That is object marshal'ed on Windows platform with python 2.7.5 is guaranteed to be un-marshalable on OS X or Ubuntu with python 2.7.5 installed, but not guaranteed to be un-marshalable with python 2.6 on Windows.
Another faster, safer by design, but less functional serialization alternative is JSON
.
The original module Pickle is almost never used.
If you need to do it fast, use cPickle.
If you need a safe one, try sPickle.