Replacing all matches except if surrounded by or o

2019-04-09 16:29发布

问题:

Given a text string (a markdown document) I need to achieve one of this two options:

  • to replace all the matches of a particular expression ((\W)(theWord)(\W)) all across the document EXCEPT the matches that are inside a markdown image syntax ![Blah theWord blah](url).

  • to replace all the matches of a particular expression ({{([^}}]+)}}\[\[[^\]\]]+\]\]) ONLY inside the markdown images, ie.: ![Blah {{theWord}}[[1234]] blah](url).

Both expressions are currently matching everything, no matter if inside the markdown image syntax or not, and I've already tried everything I could think.

Here is an example of the first option

And here is an example of the second option

Any help and/or clue will be highly appreciated.

Thanks in advance!

回答1:

Well I modified first expression a little bit as I thought there are some extra capturing groups then made them by adding a lookahead trick:

-First one (Live demo):

\b(vitae)\b(?![^[]*]\s*\()

-Second one (Live demo):

{{([^}}]+)}}\[\[[^\]\]]+\]\](?=[^[]*]\s*\()

Lookahead part explanations:

(?!            # Starting a negative lookahead
    [^[]*]     # Everything that's between brackets
    \s*        # Any whitespace
    \(         # Check if it's followed by an opening parentheses  
)              # End of lookahead which confirms the whole expression doesn't match between brackets

(?= means a positive lookahead



回答2:

You can leverage the discard technique that it really useful for this cases. It consists of having below pattern:

patternToSkip1 (*SKIP)(*FAIL)|patternToSkip2 (*SKIP)(*FAIL)| MATCH THIS PATTERN

So, according you needs:

to replace all the matches of a particular expression ((\W)(theWord)(\W)) all across the document EXCEPT the matches that are inside a markdown image syntax

You can easily achieve this in pcre through (*SKIP)(*FAIL) flags, so for you case you can use a regex like this:

\[.*?\](*SKIP)(*FAIL)|\bTheWord\b

Or using your pattern:

\[.*?\](*SKIP)(*FAIL)|(\W)(theWord)(\W)

The idea behind this regex is tell regex engine to skip the content within [...]

Working demo



回答3:

The first regex is easily fixed with a SKIP-FAIL trick:

\!\[.*?\]\(http[^)]*\)(*SKIP)(*FAIL)|\bvitae\b

To replace with the word of your choice. It is a totally valid way in PHP (PCRE) regex to match something outside some markers.

See Demo 1

As for the second one, it is harder, but acheivable with \G that ensures we match consecutively inside some markers:

(\!\[.*?|(?<!^)\G)((?>(?!\]\(http).)*?){{([^}]+?)}}\[{2}[^]]+?\]{2}(?=.*?\]\(http[^)]*?\))

To replace with $1$2{{NEW_REPLACED_TEXT}}[[NEW_DIGITS]]

See Demo 2

PHP:

$re1 = "#\!\[.*?\]\(http[^)]*\)(*SKIP)(*FAIL)|\bvitae\b#i";
$re2 = "#(\!\[.*?|(?<!^)\G)((?>(?!\]\(http).)*?){{([^}]+?)}}\[{2}[^]]+?\]{2}(?=.*?\]\(http[^)]*?\))#i";