How to perform search-and-replace within given $st

2019-02-19 11:50发布

问题:

Say, a text file have many $start-$end pairs, and within each pair there are some text. I want Perl to find-and-replace all $patterns with the $start-$end pairs; if the $pattern lies out of the pair, then don't replace it. eg for text:

xx START xx bingo xx bingo xx END xx bingo xx START xx bingo xx END bingo

There might be newlines anywhere in the text (not shown here); $pattern may appear multiple times within a pair. Expected result is:

xx START xx okyes xx okyes xx END xx bingo xx START xx okyes xx END bingo

The job seems straightforward but I just do not figure out a Perl regex to do it. Can anyone help with this?

回答1:

Looking at your 'source' I would suggest the trick here is to set $/ - the record separator.

If you set it to a single space, you can iterate word by word. And then use a range operator to determine if you're within delimiters.

Example:

#!/usr/bin/env perl

use strict;
use warnings;

local $/ = ' ';

while ( <DATA> ) {
   if (  m/START/ .. /END/ ) {
       s/bingo/okyes/g;
   } 
   print;
}

__DATA__
xx START xx bingo xx bingo xx END xx bingo xx START xx bingo xx END bingo

This prints:

xx START xx okyes xx okyes xx END xx bingo xx START xx okyes xx END bingo

You could probably accomplish this with a single regex. I'm going to suggest that you don't because it'll quite complicated and hard to understand later.



回答2:

I find things like this are most simply done using the @- and @+ built-in arrays in conjunction with substr as an lvalue

$-[1] contains the offset within the string where the first capture began, while $+[1] contains the offset where it ended. Hence $+[1]-$-[1] is the length of the captured section

This program finds all occurrences of /START(.+?)END/ and edits the captured section -- the region between START and END -- by applying a regex substitution to that substring

You may need to chnage this slightly depending on the real-world data that you're working with

use strict;
use warnings 'all';
use feature 'say';

my $str = 'xx START xx bingo xx bingo xx END xx bingo xx START xx bingo xx END bingo';
my ($start, $end, $pattern, $replacement) = qw/ START END bingo okyes /;

while ( $str =~ /\b$start\b(.+?)\b$end\b/gs ) {
     substr($str, $-[1], $+[1]-$-[1]) =~ s/$pattern/$replacement/g;
}

say $str;

output

xx START xx okyes xx okyes xx END xx bingo xx START xx okyes xx END bingo


回答3:

Split each line on START on END, keep a flag that tells you whether you are inside a range or not.

#!/usr/bin/perl
use warnings;
use strict;

my $inside;
while (<>) {
    my @strings = split /(START|END)/;
    for my $string (@strings) {
        if ('START' eq $string) {
            $inside = 1;

        } elsif ('END' eq $string) {
            undef $inside;

        } elsif ($inside) {
            $string =~ s/bingo/okyes/g;

        }

        print $string;
    }
}

Or a bit shorter using a hash as a switch:

#!/usr/bin/perl
use warnings;
use strict;
use Syntax::Construct qw{ // };

my $inside;
while (<>) {
    my @strings = split /(START|END)/;
    for my $string (@strings) {
        $inside = { START => 1,
                    END   => 0,
                  }->{$string} // $inside;

        $string =~ s/bingo/okyes/g if $inside;
        print $string;
    }
}


回答4:

Eventually used following code to accomplish what I intended:

$_ = "xx START xx bingo xx bingo xx END xx bingo xx START xx bingo xx END bingo";
print;
print "\n";
$_ =~ s/START.*?END/($s=$&) =~ s,bingo,okyes,g; $s/ge;
print;

This is a one-regex solution, using embedded expression in s///g regex, and nested s///g regexes.

Sorry for this late post, but I deeply appreciate the replies by @Sobrique, @Borodin and @choroba, which are enlightening and helpful.