PHP preg_replace regex exclude html tag

2019-07-22 00:48发布

问题:

The goal would be: I have some titles. I would like to turn the titles in a text to link. BUT when it is a link, I DON'T want to change. I am looking for the right regex.

I have a PHP code like this:

foreach($res as $r) {
  $new_string = '<a href="#" onclick="getMarkers(1,\\\' \\\',1);locate('.$r->latitude.','.$r->longitude.','.$r->zoom.','.$r->id.',0);">$0</a>';
  $introduction = (preg_replace("/\b$r->title\b(?![^<>]*(?:<|$))/i",$new_string,$introduction))
}

This part of my code doesn't work:

preg_replace("/\b$r->title\b(?![^<>]*(?:<|$))/i",$new_string,$introduction)

The problem is: This regex also change the avilable links what is in HTML tag.

Thank you for everybody patiente and I am wainting for the answers!

Thanks!


UPDATE: I would like to say thank you for HamZa for this fantastic link!

My solutions is:

 $introduction = (preg_replace("/[^>]*>.*?<\/a>(*SKIP)(*FAIL)|$r->title/im",$new_string,$introduction));

Thanks for everybody! :)

回答1:

This may be an overly simple solution, but you can use a negative lookbehind to make sure that the url does not have a href= in front of it. If not, then capture it with whatever domain REGEX you prefer.

I used a pretty clunky domain name validator, so this is going to look like a mess, but I'll explain it.

$string = 'This is just a bunch of random text. <A HREF="http://www.google.com">google</A> This is 
just a bunch of random text. http://www.yahoo.com This is just a bunch of random text. 
<A HREF="http://www.cnn.com">cnn.com</A> This is just a bunch of random text. http://www.msn.com This 
is just a bunch of random text. ';

$string = preg_replace('~(?<!href="|href=\'|href=)((?:http(?:s)?://)(?:www\.)?[-A-Z0-9.]+(?:\.[-A-Z0-9]{2,4})[-A-Z0-9_./]?(?:[-A-Z0-9#?/]+)?)~i', '<a href="$1">$1</a>', $string);

print $string;

This outputs:

This is just a bunch of random text. <A HREF="http://www.google.com">google</A> This is 
just a bunch of random text. <a href="http://www.yahoo.com">http://www.yahoo.com</a> This is just a bunch of random text. 
<A HREF="http://www.cnn.com">cnn.com</A> This is just a bunch of random text. <a href="http://www.msn.com">http://www.msn.com</a> This 
is just a bunch of random text.

Okay, now an explanation of the REGEX:

(?<!href="|href=\'|href=)    ((?:http(?:s)?://)(?:www\.)?[-A-Z0-9.]+(?:\.[-A-Z0-9]{2,4})[-A-Z0-9_./]?(?:[-A-Z0-9#?/]+)?)
              ^                                                ^
              1                                                2

There really only 2 parts to this:

  1. This is a negative lookbehind which makes sure that the url does not have href=", href=' (escaped, of course) or href= in front of it.
  2. The next part is just a domain validation script. You can use something far simpler here if you'd prefer.