Python: can I safely unpickle untrusted data?

2019-01-25 14:51发布

The pickle module documentation says right at the beginning:

Warning: The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

However, further down under restricting globals it seems to describe a way to make unpickling data safe using a whitelist of allowed objects.

Does this mean that I can safely unpickle untrusted data if I use a RestrictedUnpickler that allows only some "elementary" types, or are there additional security issues that are not addressed by this method? If there are, is there another way to make unpickling safe (obviously at the cost of not being able to unpickle every stream)?

With "elementary types" I mean precisely the following:

  • bool
  • str, bytes, bytearray
  • int, float, complex
  • tuple, list, dict, set and frozenset

2条回答
够拽才男人
2楼-- · 2019-01-25 15:13

I'd go so far as saying that there is no safe way to use pickle to handle untrusted data.

Even with restricted globals, the dynamic nature of Python is such that a determined hacker still has a chance of finding a way back to the __builtins__ mapping and from there to the Crown Jewels.

See Ned Batchelder's blog posts on circumventing restrictions on eval() that apply in equal measure to pickle.

Remember that pickle is still a stack language and you cannot foresee all possible objects produced from allowing arbitrary calls even to a limited set of globals. The pickle documentation also doesn't mention the EXT* opcodes that allow calling copyreg-installed extensions; you'll have to account for anything installed in that registry too here. All it takes is one vector allowing a object call to be turned into a getattr equivalent for your defences to crumble.

At the very least use a cryptographic signature to your data so you can validate the integrity. You'll limit the risks, but if an attacker ever managed to steal your signing secrets (keys) then they could again slip you a hacked pickle.

I would instead use an an existing innocuous format like JSON and add type annotations; e.g. store data in dictionaries with a type key and convert when loading the data.

查看更多
做个烂人
3楼-- · 2019-01-25 15:18

This idea has been discussed also on the mailing list python-ideas when addressing the problem of adding a safe pickle alternative in the standard library. For example here:

To make it safer I would have a restricted unpickler as the default (for load/loads) and force people to override it if they want to loosen restrictions. To be really explicit, I would make load/loads only work with built-in types.

And also here:

I've always wanted a version of pickle.loads() that takes a list of classes that are allowed to be instantiated.

Is the following enough for you: http://docs.python.org/3.4/library/pickle.html#restricting-globals ?

Indeed, it is. Thanks for pointing it out! I've never gotten past the module interface part of the docs. Maybe the warning at the top of the page could also mention that there are ways to mitigate the safety concerns, and point to #restricting-globals?

Yes, that would be a good idea :-)

So I don't know why the documentation has not been changed but according to me, using a RestrictedUnpickler to restrict the types that can be unpickled is a safe solution. Of course there could exist bugs in the library that compromise the system, but there could be a bug also in OpenSSL that show random memory data to everyone who asks.

查看更多
登录 后发表回答