I have a website where the user enters math equations (expressions) and then those equations are evaluated against data (constants) provided by the website. The math operations needed include symbols, arithmetic operations, min()
, max()
and some other basic functions. A sample equation could be:
max(a * b + 100, a / b - 200)
One could simply eval()
this using Python, but as we all know this leads compromising the site. What would be the safe approach of doing math equation evaluation?
If one chooses to use Python itself to evaluate the expression are there any Python sandboxes which would limit the Python, so that only user supplier operators and functions are available. Full-fledged Python, like defining functions, should be totally disabled. Subprocesses are ok (see PyPy sandbox). Specially, for loops and other holes for exploiting memory and CPU usage should be closed.
Disclaimer: I'm the Alexer mentioned in the code in the other answer. To be honest, I kind of suggested the bytecode parsing approach only half-jokingly, since I happened to have 99% of the code lying around for an unrelated project and so could whip together a POC in like a couple of minutes. That said, there shouldn't be anything wrong with it, per se; it's just that it's a more complex machinery that is needed for this task. In fact, you should be able to get away with just disassembling the code [checking the opcodes against a whitelist], checking that the constants and names are valid, and executing it with plain, evil eval after that. You should just lose the ability to insert paranoid extra checks all over the execution. (Another disclaimer: I still wouldn't feel comfortable enough to do it with eval)
Anyway, I had a boring moment, so I wrote some code to do this the smart way; using the AST instead of the bytecode. It's just an extra flag to
compile()
. (Or justast.parse()
, since you'll want the types from the module anyway)The same thing applies to this as to the bytecode version; if you check the operations against a whitelist and check that the names and values are valid, you should be able to get away with calling eval on the AST. (But again, I still wouldn't do it. Because paranoid. And paranoia is good when eval is concerned)
There is a relatively easy of doing this in Python without third party packages.
Using
compile()
to prepare a single-line Python expression to be bytecode foreval()
Not running the bytecode through
eval()
, but instead run it in your custom opcode loop and only implement opcodes which you really need. E.g. no built-ins, no attribute access, so the sandbox cannot escaped.However there are some gotchas, like preparing for CPU exhaustion and memory exhaustion, which are not specific to this method and are issue on other approaches too.
Here is a full blog post about the topic. Here is a related gist. Below is shortened sample code.