Regex ignore URL already in HTML tags

2020-01-29 18:23发布

问题:

I'm having a little problem with my Regex

I've made a custom BBcode for my website, however I also want URLs to be parsed too.

I'm using preg_replace and this is the pattern used to identify URLS:

/([\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])/is

Which works great, however if a URL is within a [img][/img] block, the above pattern also picks it up and produces a result like this:

//[img]http://url.com/toimg.jeg[/img] will produce this result:
<img src="<a href="http://url.com/toimg.jeg" target="_blank">/>
//When it should produce:
<img src="http://url.com/toimg.jeg"/>

I tried using this:

/([^"][\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/][^"])/is

With no luck.

Any help will be appreciated.

Edit: For solution See the 2nd comment on stema's answer.

回答1:

Try this

(?<!href=")(\b[\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])

See it here on Regexr

To make it more general you can simplify your lookbehind to check only for "=""

(?<!=")(\b[\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])

See it on Regexr

(?<!href=") is a negative lookbehind assertion, it ensures that there is no "href="" before your pattern.

\b is a word boundary that anchors the start of your link to a change from a non word to a word character. without this the lookbehind would be useless and it would match from the "ttp://..." on.