I know that it is easy to match anything except a given character using a regular expression.
$text = "ab ac ad";
$text =~ s/[^c]*//g; # Match anything, except c.
$text is now "c".
I don't know how to "except" strings instead of characters. How would I "match anything, except 'ac'" ? Tried [^(ac)] and [^"ac"] without success.
Is it possible at all?
If you just want to check if the string does not contain "ac", just use a negation.
or
You can easily modify this regex for your purpose.
The following solves the question as understood in the second sense described in Bart K. comment:
Also,
'abacadac'
->'acac'
It should be noted though that in most practical applications negative lookaheads prove to be more useful than this approach.
Update: In a comment on your question, you mentioned you want to clean wiki markup and remove balanced sequences of
{{
...}}
. Section 6 of the Perl FAQ covers this: Can I use Perl regular expressions to match balanced text?Consider the following program:
Its output:
For your particular example, you could use
That is, only delete an
a
orc
when they aren't part of anac
sequence.In general, this is tricky to do with a regular expression.
Say you don't want
foo
followed by optional whitespace and thenbar
in$str
. Often, it's clearer and easier to check separately. For example:You might also be interested in an answer to a similar question, where I wrote
To understand the complication, read How Regexes Work by Mark Dominus. The engine compiles regular expressions into state machines. When it's time to match, it feeds the input string to the state machine and checks whether the state machine finishes in an accept state. So to exclude a string, you have to specify a machine that accepts all inputs except a particular sequence.
What might help is a
/v
regular expression switch that creates the state machine as usual but then complements the accept-state bit for all states. It's hard to say whether this would really be useful as compared with separate checks because a/v
regular expression may still surprise people, just in different ways.If you're interested in the theoretical details, see An Introduction to Formal Languages and Automata by Peter Linz.
@ssn, A couple of comments about your question:
Please read the documentation on character classes(See "perldoc perlre" on your command line, or online at http://perldoc.perl.org/perlre.html ) - you'll see it states that for the list of characters within the square brackets the RE will "match any character from the list". Meaning order is not relevant and there are no "strings", only a list of characters. "()" and double quotes also have no special meaning inside the square brackets.
Now I'm not exactly sure why you're talking about matching but then giving an example of substitution. But to see if a string does not match the sub-string "ac" you just need to negate the match:
Say you have a string of text within which are embedded multiple occurrences of a substring. If you just want the text surrounding the sub-string, just remove all occurrences of the sub-string:
If you want the reverse - to remove all text except for all occurrences of the sub-string, I would suggest something like:
This basically counts the number of times the sub-string appears in the text and prints the sub-string that number of times using the "x" operator. Not very elegant, I'm sure a Perl-guru could come up with something better.
@ennuikiller:
This is incorrect, since it generates a warning ("Useless use of negative pattern binding (!~) in void context") under "use warnings" and doesn't do anything except remove all substrings "ac" from the text, which could be more simply written as I wrote above with:
you can use index()