perl string catenation and substitution in a singl

2019-07-27 06:54发布

问题:

I need to modify a perl variable containing a file path; it needs to begin and end with a forward slash (/) and have all instances of multiple forward slashes reduced to a single slash.

(This is because an existing process does not enforce a consistent configuration syntax, so there are hundreds of config files scattered everywhere that may or may not have slashes in the right places in file names and path names.)

Something like this:

foreach ( ($config->{'backup_path'},
           $config->{'work_path'},
           $config->{'output_path'}
         ) ) {
     $_ = "/" . $_ . "/";
     $_ =~ s/\/{2,}/\//g;
}

but this does not look optimal or particularly readable to me; I'd rather have a more elegant expression (if it ends up using an unusual regex I'll use a comment to make it clearer.)

Input & output examples

home/datamonster//c2counts becomes /home/datamonster/c2counts/

home/////teledyne/tmp/ becomes /home/teledyne/tmp/

and /var/backup/DOC/all_instruments/ will pass through unchanged

回答1:

Well, just rewriting what you got:

my @vars = qw ( backup_path work_path output_path );

for ( @{$config}{@vars} ) {
   s,^/*,/,;  #prefix
   s,/*$,/,; #suffix
   s,/+,/,g; #double slashes anywhere else. 
}

I'd be cautious - optimising for magic regex is not an advantage in every situation, because they become quite quickly unreadable.

The above uses the hash slice mechanism to select values out of a hash (reference in this case), and the fact that s/// implicitly operates on $_ anyway. And modifies the original var when it does.

But it's also useful to know, if you're operating on patterns containing / it's helpful to switch delimiters, because that way you don't get the "leaning toothpicks" effect.

s/\/{2,}/\//g can be written as:

s,/+,/,g

or

 s|/{2,}|/|g

if you want to keep the numeric quantifier, as + is inherently 1 or more which works the same here, because it collapses a double into a single anyway, but it technically matches / (and replaces it with /) where the original pattern doesn't. But you wouldn't want to use the , if you have that in your pattern, for the same reason.

However I think this does the trick;

s,(?:^/*|\b\/*$|/+),/,g for @{$config}{qw ( backup_path work_path output_path )};

This matches an alternation grouping, replacing either:

  • start of line, zero or more /
  • word boundary, zero or more / end of line
  • one or more slashes anywhere else.

with a single /.

uses the hash slice mechanism as above, but without the intermediate 'vars'.

(For some reason the second grouping doesn't work correctly without the word boundary \b zero width anchor - I think this is a backtracking issue, but I'm not entirely sure)

For bonus points - you could probably select @vars using grep if your source data structure is appropriate:

my @vars = grep { /_path$/ } keys %$config; 
#etc. Or inline with:
s,(?:^/*|\b\/*$|/+),/,g for @{$config}{grep { /_path$/ } keys %$config };

Edit: Or as Borodin notes:

s|(?:/|\A|\z)/*|/|

Giving us:

#!/usr/bin/env perl

use strict;
use warnings;
use Data::Dumper;

my $config = {
   backup_path => "/fish/",
   work_path   => "narf//zoit",
   output_path => "/wibble",
   test_path => 'home/datamonster//c2counts',
   another_path => "/home/teledyne/tmp/",
   again_path => 'home/////teledyne/tmp/',
   this_path => '/var/backup/DOC/all_instruments/',
};

s,(?:/|\A|\b\z)/*,/,g for @{$config}{grep { /_path$/ } keys %$config };

print Dumper $config;

Results:

$VAR1 = {
          'output_path' => '/wibble/',
          'this_path' => '/var/backup/DOC/all_instruments/',
          'backup_path' => '/fish/',
          'work_path' => '/narf/zoit/',
          'test_path' => '/home/datamonster/c2counts/',
          'another_path' => '/home/teledyne/tmp/',
          'again_path' => '/home/teledyne/tmp/'
        };


回答2:

you could do it like this, but I wouldn't call it more readable:

foreach ( ($config->{'backup_path'},
           $config->{'work_path'},
           $config->{'output_path'}
         ) ) {
     ( $_ = "/$_/" ) =~ s/\/{2,}/\//g;
}


回答3:

This question already got many fantastic answers.

From the view of non-perl-expert (me), some are hard to read / understand. ;)

So, I would probably use this:

my @vars = qw ( backup_path work_path output_path );
for my $var (@vars) {
    my $value = '/' . $config->{$var} . '/';
    $value =~ s|//+|/|g;
    $config->{$var} = $value;
}

For me, this is will be readable after a year too. :)