How can I safely use regexes from user input?

2019-03-25 08:34发布

问题:

My (Perl-based) application needs to let users input regular expressions, to match various strings behind the scenes. My plan so far has been to take the string and wrap it in something like

$regex = eval { qr/$text/ };
if (my $error = $@) { 
   # mangle $error to extract user-facing message

($text having been stripped of newlines ahead of time, since it's actually multiple regular expressions in a multi-line text-field that I split).

Are there any potential security risks with doing this - some weird input that could lead to arbitrary code execution? (Besides the buffer overflow vulnarabilities in the regular expression engines like CVE-2007-5116). If so, are there ways to mitigate them?

Is there a better way to do this? Any Perl modules which help abstract the operations of turning user input into regular expressions (such as extracting error messages ... or providing modifiers like /i, which I don't strictly need here, but would be nice)? I searched CPAN and didn't find much that was promising, but entertain the possibility that I missed something.

回答1:

With the (?{ code }) construct, user input could be used to execute arbitrary code. See the example in perlre#code and where it says

local $cnt = $cnt + 1,

replace it with the expression

system("rm -rf /home/fennec"); print "Ha ha.\n";

(Actually, don't do that.)



回答2:

Using untrusted input as a regular expression creates denial-of-service vulnerability as described in perlsec:

Regular expressions - Perl's regular expression engine is so called NFA (Non-deterministic Finite Automaton), which among other things means that it can rather easily consume large amounts of both time and space if the regular expression may match in several ways. Careful crafting of the regular expressions can help but quite often there really isn't much one can do (the book "Mastering Regular Expressions" is required reading, see perlfaq2). Running out of space manifests itself by Perl running out of memory.



回答3:

the best way, is not to let users have too much privilege. Provide an interface just enough for users to do what they want. (like an ATM machine with only buttons for various options, no need for keyboard input). Of course, if you need user to key in input, then provide text box and then at the back end, use Perl to process the request (eg sanitizing etc). The motive behind letting your users input a regex is to search for string patterns right?? Then in that case, the most simplest and secure way is to tell them to input just the string. Then at the back end, you use Perl's regex to search for it. Is there any other compelling reason to have user input regex themselves?



回答4:

There is some discussion about this over at The Monastery.

TLDR: use re::engine::RE2 -strict => 1;

Make sure to add -strict => 1 to your use statement or re::engine::RE2 will fall back to Perl's re.

The following is a citation from Paul Wankadia (junyer), owner of the project on GitHub:

RE2 was designed and implemented with an explicit goal of being able to handle regular expressions from untrusted users without risk. One of its primary guarantees is that the match time is linear in the length of the input string. It was also written with production concerns in mind: the parser, the compiler and the execution engines limit their memory usage by working within a configurable budget – failing gracefully when exhausted – and they avoid stack overflow by eschewing recursion.

To sum up the important points:

  • It's safe from arbitrary code execution by default, but add "no re 'eval';" to prevent PERL5OPT or ??anything else?? from setting it on you. I'm not sure if doing so prevents everything.

  • Use a sub-process(fork) with BSD::Resource(even on Linux) to ulimit memory and kill the child after some timeout.



回答5:

Perhaps you could use a different regex engine that does not have the dangerous code tag support.

I haven't tried it but there is a PCRE for perl. You may also be able to limit or remove code support using this info on creating custom regex engines.