How can I match strings that don't match a par

I know that it is easy to match anything except a given character using a regular expression.

$text = "ab ac ad";
$text =~ s/[^c]*//g; # Match anything, except c.

$text is now "c".

I don't know how to "except" strings instead of characters. How would I "match anything, except 'ac'" ? Tried [^(ac)] and [^"ac"] without success.

Is it possible at all?

标签： regex perl

6条回答

Ridiculous、

2楼-- · 2020-05-27 04:25

If you just want to check if the string does not contain "ac", just use a negation.

$text = "ab ac ad";

print "ac not found" if $text !~ /ac/;

print "ac not found" unless $text =~ /ac/;

0人赞添加讨论(0) 举报

我只想做你的唯一

3楼-- · 2020-05-27 04:30

You can easily modify this regex for your purpose.

use Test::More 0.88;

#Match any whole text that does not contain a string
my $re=qr/^(?:(?!ac).)*$/;
my $str='ab ac ad';

ok(!$str=~$re);

$str='ab af ad';
ok($str=~$re);

done_testing();

0人赞添加讨论(0) 举报

我命由我不由天

4楼-- · 2020-05-27 04:33

The following solves the question as understood in the second sense described in Bart K. comment:

>> $text='ab ac ad';
>> $text =~ s/(ac)|./\1/g;
>> print $text;
ac

Also, 'abacadac' -> 'acac'

It should be noted though that in most practical applications negative lookaheads prove to be more useful than this approach.

0人赞添加讨论(0) 举报

虎瘦雄心在

5楼-- · 2020-05-27 04:40

Update: In a comment on your question, you mentioned you want to clean wiki markup and remove balanced sequences of {{ ... }}. Section 6 of the Perl FAQ covers this: Can I use Perl regular expressions to match balanced text?

Consider the following program:

#! /usr/bin/perl

use warnings;
use strict;

use Text::Balanced qw/ extract_tagged /;

# for demo only
*ARGV = *DATA;

while (<>) {
  if (s/^(.+?)(?=\{\{)//) {
    print $1;
    my(undef,$after) = extract_tagged $_, "{{" => "}}";

    if (defined $after) {
      $_ = $after;
      redo;
    }
  }

  print;
}

__DATA__
Lorem ipsum dolor sit amet, consectetur
adipiscing elit. {{delete me}} Sed quis
nulla ut dolor {{me too}} fringilla
mollis {{ quis {{ ac }} erat.

Its output:

Lorem ipsum dolor sit amet, consectetur
adipiscing elit.  Sed quis
nulla ut dolor  fringilla
mollis {{ quis  erat.

For your particular example, you could use

$text =~ s/[^ac]|a(?!c)|(?<!a)c//g;

That is, only delete an a or c when they aren't part of an ac sequence.

In general, this is tricky to do with a regular expression.

Say you don't want foo followed by optional whitespace and then bar in $str. Often, it's clearer and easier to check separately. For example:

die "invalid string ($str)"
  if $str =~ /^.*foo\s*bar/;

You might also be interested in an answer to a similar question, where I wrote

my $nofoo = qr/
  (      [^f] |
    f  (?! o) |
    fo (?! o  \s* bar)
  )*
/x;

my $pattern = qr/^ $nofoo bar /x;

To understand the complication, read How Regexes Work by Mark Dominus. The engine compiles regular expressions into state machines. When it's time to match, it feeds the input string to the state machine and checks whether the state machine finishes in an accept state. So to exclude a string, you have to specify a machine that accepts all inputs except a particular sequence.

What might help is a /v regular expression switch that creates the state machine as usual but then complements the accept-state bit for all states. It's hard to say whether this would really be useful as compared with separate checks because a /v regular expression may still surprise people, just in different ways.

If you're interested in the theoretical details, see An Introduction to Formal Languages and Automata by Peter Linz.

0人赞添加讨论(0) 举报

SAY GOODBYE

6楼-- · 2020-05-27 04:46

$text =~ s/[^c]*//g; // Match anything, except c.

@ssn, A couple of comments about your question:

"//" is not a comment in Perl. Only "#" is.
"[^c]*" - there is no need for the "*" there. "[^c]" means the character class composed of all characters except the letter "c". Then you use the /g modifier, meaning all such occurrences in the text will be replaced (in your example, with nothing). The "zero or more" ("*") modifier is therefore redundant.

How would I "match anything, except 'ac'" ? Tried [^(ac)] and [^"ac"] without success.

Please read the documentation on character classes(See "perldoc perlre" on your command line, or online at http://perldoc.perl.org/perlre.html ) - you'll see it states that for the list of characters within the square brackets the RE will "match any character from the list". Meaning order is not relevant and there are no "strings", only a list of characters. "()" and double quotes also have no special meaning inside the square brackets.

Now I'm not exactly sure why you're talking about matching but then giving an example of substitution. But to see if a string does not match the sub-string "ac" you just need to negate the match:

use strict; use warnings;
my $text = "ab ac ad";
if ($text !~ m/ac/) {
   print "Yey the text doesn't match 'ac'!\n"; # this shouldn't be printed
}

Say you have a string of text within which are embedded multiple occurrences of a substring. If you just want the text surrounding the sub-string, just remove all occurrences of the sub-string:

$text =~ s/ac//g;

If you want the reverse - to remove all text except for all occurrences of the sub-string, I would suggest something like:

use strict; use warnings;
my $text = "ab ac ad ac ae";
my $sub_str = "ac";
my @captured = $text =~ m/($sub_str)/g;
my $num = scalar @captured;
print (($sub_str x $num) . "\n");

This basically counts the number of times the sub-string appears in the text and prints the sub-string that number of times using the "x" operator. Not very elegant, I'm sure a Perl-guru could come up with something better.

@ennuikiller:

my $text = "ab ac ad";
$text !~ s/(ac)//g; # Match anything, except ac.

This is incorrect, since it generates a warning ("Useless use of negative pattern binding (!~) in void context") under "use warnings" and doesn't do anything except remove all substrings "ac" from the text, which could be more simply written as I wrote above with:

$text =~ s/ac//g;

0人赞添加讨论(0) 举报

家丑人穷心不美

7楼-- · 2020-05-27 04:46

you can use index()

$text = "ab ac ad";
print "ac not found" if ( index($text,"ac") == -1 );

0人赞添加讨论(0) 举报

How can I match strings that don't match a par

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间