Python eval: is it still dangerous if I disable bu

2019-01-13 20:47发布

问题:

We all know that eval is dangerous, even if you hide dangerous functions, because you can use Python's introspection features to dig down into things and re-extract them. For example, even if you delete __builtins__, you can retrieve them with

[c for c in ().__class__.__base__.__subclasses__()  
 if c.__name__ == 'catch_warnings'][0]()._module.__builtins__

However, every example I've seen of this uses attribute access. What if I disable all builtins, and disable attribute access (by tokenizing the input with a Python tokenizer and rejecting it if it has an attribute access token)?

And before you ask, no, for my use-case, I do not need either of these, so it isn't too crippling.

What I'm trying to do is make SymPy's sympify function more safe. Currently it tokenizes the input, does some transformations on it, and evals it in a namespace. But it's unsafe because it allows attribute access (even though it really doesn't need it).

回答1:

I'm going to mention one of the new features of Python 3.6 - f-strings.

They can evaluate expressions,

>>> eval('f"{().__class__.__base__}"', {'__builtins__': None}, {})
"<class 'object'>"

but the attribute access won't be detected by Python's tokenizer:

0,0-0,0:            ENCODING       'utf-8'        
1,0-1,1:            ERRORTOKEN     "'"            
1,1-1,27:           STRING         'f"{().__class__.__base__}"'
2,0-2,0:            ENDMARKER      '' 


回答2:

It is possible to construct a return value from eval that would throw an exception outside eval if you tried to print, log, repr, anything:

eval('''((lambda f: (lambda x: x(x))(lambda y: f(lambda *args: y(y)(*args))))
        (lambda f: lambda n: (1,(1,(1,(1,f(n-1))))) if n else 1)(300))''')

This creates a nested tuple of form (1,(1,(1,(1...; that value cannot be printed (on Python 3), stred or repred; all attempts to debug it would lead to

RuntimeError: maximum recursion depth exceeded while getting the repr of a tuple

pprint and saferepr fails too:

...
  File "/usr/lib/python3.4/pprint.py", line 390, in _safe_repr
    orepr, oreadable, orecur = _safe_repr(o, context, maxlevels, level)
  File "/usr/lib/python3.4/pprint.py", line 340, in _safe_repr
    if issubclass(typ, dict) and r is dict.__repr__:
RuntimeError: maximum recursion depth exceeded while calling a Python object

Thus there is no safe built-in function to stringify this: the following helper could be of use:

def excsafe_repr(obj):
    try:
        return repr(obj)
    except:
        return object.__repr__(obj).replace('>', ' [exception raised]>')

And then there is the problem that print in Python 2 does not actually use str/repr, so you do not have any safety due to lack of recursion checks. That is, take the return value of the lambda monster above, and you cannot str, repr it, but ordinary print (not print_function!) prints it nicely. However, you can exploit this to generate a SIGSEGV on Python 2 if you know it will be printed using the print statement:

print eval('(lambda i: [i for i in ((i, 1) for j in range(1000000))][-1])(1)')

crashes Python 2 with SIGSEGV. This is WONTFIX in the bug tracker. Thus never use print-the-statement if you want to be safe. from __future__ import print_function!


This is not a crash, but

eval('(1,' * 100 + ')' * 100)

when run, outputs

s_push: parser stack overflow
Traceback (most recent call last):
  File "yyy.py", line 1, in <module>
    eval('(1,' * 100 + ')' * 100)
MemoryError

The MemoryError can be caught, is a subclass of Exception. The parser has some really conservative limits to avoid crashes from stackoverflows (pun intended). However, s_push: parser stack overflow is output to stderr by C code, and cannot be suppressed.


And just yesterday I asked why doesn't Python 3.4 be fixed for a crash from,

% python3  
Python 3.4.3 (default, Mar 26 2015, 22:03:40) 
[GCC 4.9.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> class A:
...     def f(self):
...         nonlocal __x
... 
[4]    19173 segmentation fault (core dumped)  python3

and Serhiy Storchaka's answer confirmed that Python core devs do not consider SIGSEGV on seemingly well-formed code a security issue:

Only security fixes are accepted for 3.4.

Thus it can be concluded that it can never be considered safe to execute any code from 3rd party in Python, sanitized or not.

And Nick Coghlan then added:

And as some additional background as to why segmentation faults provoked by Python code aren't currently considered a security bug: since CPython doesn't include a security sandbox, we're already relying entirely on the OS to provide process isolation. That OS level security boundary isn't affected by whether the code is running "normally", or in a modified state following a deliberately triggered segmentation fault.



回答3:

Users can still DoS you by inputting an expression that evaluates to a huge number, which would fill your memory and crash the Python process, for example

'10**10**100'

I am definitely still curious if more traditional attacks, like recovering builtins or creating a segfault, are possible here.

EDIT:

It turns out, even Python's parser has this issue.

lambda: 10**10**100

will hang, because it tries to precompute the constant.



回答4:

I don't believe Python is designed to have any security against untrusted code. Here's an easy way to induce a segfault via stack overflow (on the C stack) in the official Python 2 interpreter:

eval('()' * 98765)

From my answer to the "Shortest code that returns SIGSEGV" Code Golf question.



回答5:

Controlling the locals and globals dictionaries is extremely important. Otherwise, someone could just pass in eval or exec, and call it recursively

safe_eval('''e("""[c for c in ().__class__.__base__.__subclasses__() 
    if c.__name__ == \'catch_warnings\'][0]()._module.__builtins__""")''', 
    globals={'e': eval})

The expression in the recursive eval is just a string.

You also need to set the eval and exec names in the global namespace to something that isn't the real eval or exec. The global namespace is important. If you use a local namespace, anything that creates a separate namespace, such as comprehensions and lambdas, will work around it

safe_eval('''[eval("""[c for c in ().__class__.__base__.__subclasses__()
    if c.__name__ == \'catch_warnings\'][0]()._module.__builtins__""") for i in [1]][0]''', locals={'eval': None})

safe_eval('''(lambda: eval("""[c for c in ().__class__.__base__.__subclasses__()
    if c.__name__ == \'catch_warnings\'][0]()._module.__builtins__"""))()''',
    locals={'eval': None})

Again, here, safe_eval only sees a string and a function call, not attribute accesses.

You also need to clear out the safe_eval function itself, if it has a flag to disable safe parsing. Otherwise you could simply do

safe_eval('safe_eval("<dangerous code>", safe=False)')


回答6:

Here is a safe_eval example which will ensure that the evaluated expression do not contain unsafe tokens. It does not try to take the literal_eval approach of interpreting the AST but rather whitelist the token types and use the real eval if expression passed test.

# license: MIT (C) tardyp
import ast


def safe_eval(expr, variables):
    """
    Safely evaluate a a string containing a Python
    expression.  The string or node provided may only consist of the following
    Python literal structures: strings, numbers, tuples, lists, dicts, booleans,
    and None. safe operators are allowed (and, or, ==, !=, not, +, -, ^, %, in, is)
    """
    _safe_names = {'None': None, 'True': True, 'False': False}
    _safe_nodes = [
        'Add', 'And', 'BinOp', 'BitAnd', 'BitOr', 'BitXor', 'BoolOp',
        'Compare', 'Dict', 'Eq', 'Expr', 'Expression', 'For',
        'Gt', 'GtE', 'Is', 'In', 'IsNot', 'LShift', 'List',
        'Load', 'Lt', 'LtE', 'Mod', 'Name', 'Not', 'NotEq', 'NotIn',
        'Num', 'Or', 'RShift', 'Set', 'Slice', 'Str', 'Sub',
        'Tuple', 'UAdd', 'USub', 'UnaryOp', 'boolop', 'cmpop',
        'expr', 'expr_context', 'operator', 'slice', 'unaryop']
    node = ast.parse(expr, mode='eval')
    for subnode in ast.walk(node):
        subnode_name = type(subnode).__name__
        if isinstance(subnode, ast.Name):
            if subnode.id not in _safe_names and subnode.id not in variables:
                raise ValueError("Unsafe expression {}. contains {}".format(expr, subnode.id))
        if subnode_name not in _safe_nodes:
            raise ValueError("Unsafe expression {}. contains {}".format(expr, subnode_name))

    return eval(expr, variables)



class SafeEvalTests(unittest.TestCase):

    def test_basic(self):
        self.assertEqual(safe_eval("1", {}), 1)

    def test_local(self):
        self.assertEqual(safe_eval("a", {'a': 2}), 2)

    def test_local_bool(self):
        self.assertEqual(safe_eval("a==2", {'a': 2}), True)

    def test_lambda(self):
        self.assertRaises(ValueError, safe_eval, "lambda : None", {'a': 2})

    def test_bad_name(self):
        self.assertRaises(ValueError, safe_eval, "a == None2", {'a': 2})

    def test_attr(self):
        self.assertRaises(ValueError, safe_eval, "a.__dict__", {'a': 2})

    def test_eval(self):
        self.assertRaises(ValueError, safe_eval, "eval('os.exit()')", {})

    def test_exec(self):
        self.assertRaises(SyntaxError, safe_eval, "exec 'import os'", {})

    def test_multiply(self):
        self.assertRaises(ValueError, safe_eval, "'s' * 3", {})

    def test_power(self):
        self.assertRaises(ValueError, safe_eval, "3 ** 3", {})

    def test_comprehensions(self):
        self.assertRaises(ValueError, safe_eval, "[i for i in [1,2]]", {'i': 1})