I'm working on a web app where users will be able to supply strings that the server will then substitute variables into.
Preferably I'd like to use PEP 3101 format() syntax and I'm looking at the feasibility of overriding methods in Formatter to make it secure for untrusted input.
Here are the risks I can see with .format() as it stands:
- Padding lets you specify arbitrary lengths, so '{:>9999999999}'.format(..) could run the server out of memory and be a DOS. I'd need to disable this.
- Format lets you access the fields inside objects, which is useful, but it's creepy that you can access dunder variables and start drilling into bits of the standard library. There's no telling where there might be a getattr() that has side effects or returns something secret. I would whitelist attribute/index access by overriding get_field().
- I'd need to catch some exceptions, naturally.
My assumptions are:
- None of the traditional C format string exploits apply to Python, because specifying a parameter is a bounds-checked access into a collection, rather than directly popping off the thread's stack.
- The web framework I'm using escapes every variable that's substituted into a page template, and so long as it's the last stop before output, I'm safe from cross-site scripting attacks emerging from de-escaping.
What are your thoughts? Possible? Impossible? Merely unwise?
Edit: Armin Ronacher outlines a nasty information leak if you don't filter out dunder variable access, but seems to regard securing format() as feasible:
{local_foo.__init__.__globals__[secret_global]}
http://lucumr.pocoo.org/2016/12/29/careful-with-str-format/
(Personally, I didn't actually go the untrusted format() route in my product, but am updating for the sake of completeness)