I'm writing an abstraction function that will ask the user a given question and validate the answer based on a given regular expression. The question is repeated until the answer matches the validation regexp. However, I also want the client to be able to specify whether the answer must match case-sensitively or not. So something like this:
sub ask {
my ($prompt, $validationRe, $caseSensitive) = @_;
my $modifier = ($caseSensitive) ? "" : "i";
my $ans;
my $isValid;
do {
print $prompt;
$ans = <>;
chomp($ans);
# What I want to do that doesn't work:
# $isValid = $ans =~ /$validationRe/$modifier;
# What I have to do:
$isValid = ($caseSensitive) ?
($ans =~ /$validationRe/) :
($ans =~ /$validationRe/i);
} while (!$isValid);
return $ans;
}
Upshot: is there any way to dynamically specify a regular expression's modifiers?
Upshot: is there any way to dynamically specify a regular expression's modifiers?
From perldoc perlre
:
"(?adlupimsx-imsx)"
"(?^alupimsx)"
One or more embedded pattern-match modifiers, to be turned on (or
turned off, if preceded by "-") for the remainder of the pattern or
the remainder of the enclosing pattern group (if any).
This is particularly useful for dynamic patterns, such as those read
in from a configuration file, taken from an argument, or specified in
a table somewhere. Consider the case where some patterns want to be
case-sensitive and some do not: The case-insensitive ones merely need
to include "(?i)" at the front of the pattern.
Which gives you something along the lines of
$isValid = $ans =~ m/(?$modifier)$validationRe/;
Just be sure to take the appropriate security precautions when accepting user input in this way.
You might also like the qr
operator which quotes its STRING as a regular expression.
my $rex = qr/(?$mod)$pattern/;
$isValid = <STDIN> =~ $rex;
Get rid of your $caseSensitive
parameter, as it will be useless in many cases. Instead, users of that function can encode the necessary information directly in the $validationRe
regex.
When you create a regex object like qr/foo/
, then the pattern is at that point compiled into instructions for the regex engine. If you stringify a regex object, you'll get a string that when interpolated back into a regex will have exactly the same behaviour as the original regex object. Most importantly, this means that all flags provided or omitted from the regex object literal will be preserved and can't be overridden! This is by design, so that a regex object will continue to behave identical no matter what context it is used in.
That's a bit dry, so let's use an example. Here is a match
function that tries to apply a couple similar regexes to a list of strings. Which one will match?
use strict;
use warnings;
use feature 'say';
# This sub takes a string to match on, a regex, and a case insensitive marker.
# The regex will be recompiled to anchor at the start and end of the string.
sub match {
my ($str, $re, $i) = @_;
return $str =~ /\A$re\z/i if $i;
return $str =~ /\A$re\z/;
}
my @words = qw/foo FOO foO/;
my $real_regex = qr/foo/;
my $fake_regex = 'foo';
for my $re ($fake_regex, $real_regex) {
for my $i (0, 1) {
for my $word (@words) {
my $match = 0+ match($word, $re, $i);
my $output = qq("$word" =~ /$re/);
$output .= "i" if $i;
say "$output\t-->" . uc($match ? "match" : "fail");
}
}
}
Output:
"foo" =~ /foo/ -->MATCH
"FOO" =~ /foo/ -->FAIL
"foO" =~ /foo/ -->FAIL
"foo" =~ /foo/i -->MATCH
"FOO" =~ /foo/i -->MATCH
"foO" =~ /foo/i -->MATCH
"foo" =~ /(?^:foo)/ -->MATCH
"FOO" =~ /(?^:foo)/ -->FAIL
"foO" =~ /(?^:foo)/ -->FAIL
"foo" =~ /(?^:foo)/i -->MATCH
"FOO" =~ /(?^:foo)/i -->FAIL
"foO" =~ /(?^:foo)/i -->FAIL
First, we should notice that the string representation of regex objects has this weird (?^:...)
form. In a non-capturing group (?: ... )
, modifiers for the pattern inside the group can be added or removed between the question mark and colon, while the ^
indicates the default set of flags.
Now when we look at the fake regex that's actually just a string being interpolated, we can see that the addition of the /i
flag makes a difference as expected. But when we use a real regex object, it doesn't change anything: The outside /i
cannot override the (?^: ... )
flags.
It is probably best to assume that all regexes already are regex objects and should not be interfered with. If you load the regex patterns from a file, you should require the regexes to use the (?: ... )
syntax to apply flages (e.g. (?^i:foo)
as an equivalent to qr/foo/i
). E.g. loading one regex per line from a file handle could look like:
my @regexes;
while (<$fh>) {
chomp;
push @regexes, qr/$_/; # will die here on regex syntax errors
}
You need to use eval function. Below code will work:
sub ask {
my ($prompt, $validationRe, $caseSensitive) = @_;
my $modifier = ($caseSensitive) ? "" : "i";
my $ans;
my $isValid;
do {
print $prompt;
$ans = <>;
chomp($ans);
# What I want to do that doesn't work:
# $isValid = $ans =~ /$validationRe/$modifier;
$isValid = eval "$ans =~ /$validationRe/$modifier";
# What I have to do:
#$isValid = ($caseSensitive) ?
# ($ans =~ /$validationRe/) :
# ($ans =~ /$validationRe/i);
} while (!$isValid);
return $ans;
}