The pickle module documentation says right at the beginning:
Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
However, further down under restricting globals it seems to describe a way to make unpickling data safe using a whitelist of allowed objects.
Does this mean that I can safely unpickle untrusted data if I use a RestrictedUnpickler
that allows only some "elementary" types, or are there additional security issues that are not addressed by this method? If there are, is there another way to make unpickling safe (obviously at the cost of not being able to unpickle every stream)?
With "elementary types" I mean precisely the following:
bool
str
,bytes
,bytearray
int
,float
,complex
tuple
,list
,dict
,set
andfrozenset
I'd go so far as saying that there is no safe way to use pickle to handle untrusted data.
Even with restricted globals, the dynamic nature of Python is such that a determined hacker still has a chance of finding a way back to the
__builtins__
mapping and from there to the Crown Jewels.See Ned Batchelder's blog posts on circumventing restrictions on
eval()
that apply in equal measure topickle
.Remember that
pickle
is still a stack language and you cannot foresee all possible objects produced from allowing arbitrary calls even to a limited set of globals. The pickle documentation also doesn't mention theEXT*
opcodes that allow callingcopyreg
-installed extensions; you'll have to account for anything installed in that registry too here. All it takes is one vector allowing a object call to be turned into agetattr
equivalent for your defences to crumble.At the very least use a cryptographic signature to your data so you can validate the integrity. You'll limit the risks, but if an attacker ever managed to steal your signing secrets (keys) then they could again slip you a hacked pickle.
I would instead use an an existing innocuous format like JSON and add type annotations; e.g. store data in dictionaries with a type key and convert when loading the data.
This idea has been discussed also on the mailing list python-ideas when addressing the problem of adding a safe
pickle
alternative in the standard library. For example here:And also here:
So I don't know why the documentation has not been changed but according to me, using a
RestrictedUnpickler
to restrict the types that can be unpickled is a safe solution. Of course there could exist bugs in the library that compromise the system, but there could be a bug also in OpenSSL that show random memory data to everyone who asks.