Can Python's string .format() be made safe for

2019-02-08 13:11发布

问题:

I'm working on a web app where users will be able to supply strings that the server will then substitute variables into.

Preferably I'd like to use PEP 3101 format() syntax and I'm looking at the feasibility of overriding methods in Formatter to make it secure for untrusted input.

Here are the risks I can see with .format() as it stands:

  • Padding lets you specify arbitrary lengths, so '{:>9999999999}'.format(..) could run the server out of memory and be a DOS. I'd need to disable this.
  • Format lets you access the fields inside objects, which is useful, but it's creepy that you can access dunder variables and start drilling into bits of the standard library. There's no telling where there might be a getattr() that has side effects or returns something secret. I would whitelist attribute/index access by overriding get_field().
  • I'd need to catch some exceptions, naturally.

My assumptions are:

  • None of the traditional C format string exploits apply to Python, because specifying a parameter is a bounds-checked access into a collection, rather than directly popping off the thread's stack.
  • The web framework I'm using escapes every variable that's substituted into a page template, and so long as it's the last stop before output, I'm safe from cross-site scripting attacks emerging from de-escaping.

What are your thoughts? Possible? Impossible? Merely unwise?


Edit: Armin Ronacher outlines a nasty information leak if you don't filter out dunder variable access, but seems to regard securing format() as feasible:

{local_foo.__init__.__globals__[secret_global]}

http://lucumr.pocoo.org/2016/12/29/careful-with-str-format/

(Personally, I didn't actually go the untrusted format() route in my product, but am updating for the sake of completeness)

回答1:

Good instinct. Yes, an attacker being able to supply arbitrary format string is a vulnerability under python.

  • The denial of service is probably the most simple to address. In this case, limiting the size of the string or the number of operators within the string will mitigate this issue. There should be a setting where no reasonable user will need to generate a string with more variables than X, and this amount of computation isn't at risk of being exploited in a DoS attack.
  • Being able to access attributes within an object could be dangerous. However, I don't think that the Object parent class has any useful information. The object supplied to the format would have to contain something sensitive. In any case, this type of notation can limited with a regular expression.
  • If the format strings are user supplied then a user might need to know the error message for debugging. However, error mesages can contain senstive information such as local paths or class names. Make sure to limit the information that an attacker can obtain.

Look over the python format string specification and forbid functionality you don't want the user to have with a regex.