Can I build a Perl Regex from a set of hash keys

2019-06-24 05:37发布

(related to previous question: Do I need to reset a Perl hash index?)

I have a hash coming in from a file which is defined as follows:

%project_keys = (
    cd     => "continuous_delivery",
    cm     => "customer_management",
    dem    => "demand",
    dis    => "dis",
    do     => "devops",
    sel    => "selection",
    seo    => "seo"
);

I need to check whether a review title has the correct format, and if so, link to a separate URL.

For instance, if a review title is

"cm1234 - Do some CM work"

then I want to link to the following URL:

http://projects/customer_management/setter/1234

Currently, I'm using the following (hard-coded) regex:

if ($title =~ /(cd|cm|dem|dis|do|sel|seo)(\d+)\s.*/) {
    my $url = 'http://projects/'.$project_keys{$1}.'/setter/'.$2
}

but obviously I'd like to build the regex from the hash keys themselves (the hash example above will change fairly frequently). I thought about simply naively concatenating the keys as follows:

# Build the regex
my $regex = '';
foreach my $key ( keys %project_keys ) {
    $regex += $key + '|';
}
$regex = substr($regex, 0, -1); # Chop off the last pipe
$regex = '('.$regex.')(\d+)\s.*';
if ($title =~ /$regex/) {
    my $url = 'http://projects/'.$project_keys{$1}.'/setter/'.$2
}

but a) it's not working as I would wish, and b) I assume there's a much better Perl way to do this. Or is there?

标签: regex perl hash
1条回答
女痞
2楼-- · 2019-06-24 06:00

Your main problem comes from trying to use + to join strings. It doesn't do that in Perl, the string concatenation operator is .. But a loop with string concatenation can often be done better with join instead.

I would suggest:

my $project_match = join '|', map quotemeta, keys %project_keys;

if ($title =~ /($project_match)(\d+)\s/) {
   my $url = 'http://projects/'.$project_keys{$1}.'/setter/'.$2;
   # Something with $url
}

quotemeta is a function that escapes any regex metacharacters that occur in a string. There aren't any in your example, but it's good practice to use it always and avoid unexpected bugs.

I left out the trailing .* in your pattern, because there's no need to say "and then some stuff, or maybe no stuff" if you don't actually do anything with the stuff. The pattern doesn't need to match the entire string, unless you anchor it to the beginning and end of the string.

查看更多
登录 后发表回答