Ignore img tags in preg_replace

2020-02-14 19:45发布

I want to replace a word in an HTML string, but I want to exclude the replacement if the word was in the attributes of the 'img' element.

Example:

$word = 'google';
$html = 'I like google and here is its logo <img src="images/google.png" alt="Image of google logo" />';

$replacement = '<a href="http://google.com">Google</a>';
$result =  preg_replace('/\s'.($word).'/u', $replacement, $html);

preg_replace will also replace the "google" words inside 'src' and 'alt' attributes, I want it to just replace the word outside the 'img' element.

2条回答
我只想做你的唯一
2楼-- · 2020-02-14 20:25

Use positive lookahead (?=.*?<.*?/>)

$html = 'I like google and here is its logo <img src="images/google.png" alt="Image of google logo" />';

$result = preg_replace('%(?=.*?<.*?/>)google%im', 'ANOTHER WORD', $html);

DEMO

EXPLANATION:

(?=.*?<.*?/>)google
-------------------

Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=.*?<.*?/>)»
   Match any single character that is NOT a line break character (line feed) «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
   Match the character “<” literally «<»
   Match any single character that is NOT a line break character (line feed) «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
   Match the character string “/>” literally «/>»
Match the character string “google” literally (case insensitive) «google»

ANOTHER WORD

Insert the character string “ANOTHER WORD” literally «ANOTHER WORD»

More info about Regex Lookaround

查看更多
等我变得足够好
3楼-- · 2020-02-14 20:33

You can use the discard pattern. For instance you can use a regex like this:

<.*?google.*?\/>(*SKIP)(*FAIL)|google

Working demo

enter image description here

The idea behind this pattern is to discard the google word inside < and > but keep the rest:

<.*?google.*?\/>(*SKIP)(*FAIL)  --> This part will skip the matches where google is within <...>
|google                         --> but will keep the others google

You can add many "discard" pattern you want, like:

discard patt1(*SKIP)(*FAIL)|discard patt(*SKIP)(*FAIL)|...(*SKIP)(*FAIL)|keep this
查看更多
登录 后发表回答