My (Perl-based) application needs to let users input regular expressions, to match various strings behind the scenes. My plan so far has been to take the string and wrap it in something like
$regex = eval { qr/$text/ };
if (my $error = $@) {
# mangle $error to extract user-facing message
($text
having been stripped of newlines ahead of time, since it's actually multiple regular expressions in a multi-line text-field that I split
).
Are there any potential security risks with doing this - some weird input that could lead to arbitrary code execution? (Besides the buffer overflow vulnarabilities in the regular expression engines like CVE-2007-5116). If so, are there ways to mitigate them?
Is there a better way to do this? Any Perl modules which help abstract the operations of turning user input into regular expressions (such as extracting error messages ... or providing modifiers like /i
, which I don't strictly need here, but would be nice)? I searched CPAN and didn't find much that was promising, but entertain the possibility that I missed something.
There is some discussion about this over at The Monastery.
TLDR:
use re::engine::RE2 -strict => 1;
Make sure to add
-strict => 1
to your use statement or re::engine::RE2 will fall back to Perl's re.The following is a citation from Paul Wankadia (junyer), owner of the project on GitHub:
To sum up the important points:
It's safe from arbitrary code execution by default, but add "no re 'eval';" to prevent PERL5OPT or ??anything else?? from setting it on you. I'm not sure if doing so prevents everything.
Use a sub-process(fork) with BSD::Resource(even on Linux) to ulimit memory and kill the child after some timeout.
Perhaps you could use a different regex engine that does not have the dangerous code tag support.
I haven't tried it but there is a PCRE for perl. You may also be able to limit or remove code support using this info on creating custom regex engines.
Using untrusted input as a regular expression creates denial-of-service vulnerability as described in perlsec:
the best way, is not to let users have too much privilege. Provide an interface just enough for users to do what they want. (like an ATM machine with only buttons for various options, no need for keyboard input). Of course, if you need user to key in input, then provide text box and then at the back end, use Perl to process the request (eg sanitizing etc). The motive behind letting your users input a regex is to search for string patterns right?? Then in that case, the most simplest and secure way is to tell them to input just the string. Then at the back end, you use Perl's regex to search for it. Is there any other compelling reason to have user input regex themselves?
With the
(?{ code })
construct, user input could be used to execute arbitrary code. See the example in perlre#code and where it saysreplace it with the expression
(Actually, don't do that.)