How to exclude a word or string from an URL - Rege

2019-07-19 07:19发布

I'm using the following Regex to match all types of URL in PHP (It works very well):

 $reg_exUrl = "%\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";

But now, I want to exclude Youtube, youtu.be and Vimeo URLs:

I'm doing something like this after researching, but it is not working:

$reg_exUrl = "%\b(([\w-]+://?|www[.])(?!youtube|youtu|vimeo)[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";

I want to do this, because I have another regex that match Youtube urls which returns an iframe and this regex is causing confusion between the two Regex.

Any help would be gratefully appreciated, thanks.

标签: php regex url
1条回答
趁早两清
2楼-- · 2019-07-19 08:20

socodLib, to exclude something from a string, place yourself at the beginning of the string by anchoring with a ^ (or use another anchor) and use a negative lookahead to assert that the string doesn't contain a word, like so:

^(?!.*?(?:youtube|some other bad word|some\.string\.with\.dots))

Before we make the regex look too complex by concatenating it with yours, let;s see what we would do if you wanted to match some word characters \w+ but not youtube or google, you would write:

^(?!.*?(?:youtube|google))\w+

As you can see, after the assertion (where we say what we don't want), we say what we do want by using the \w+

In your case, let's add a negative lookahead to your initial regex (which I have not tuned):

$reg_exUrl = "%(?i)\b(?!.*?(?:youtu\.?be|vimeo))(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))%s";

I took the liberty of making the regex case insensitive with (?i). You could also have added i to your s modifier at the end. The youtu\.?be expression allows for an optional dot.

I am certain you can apply this recipe to your expression and other regexes in the future.

Reference

  1. Regex lookarounds
  2. StackOverflow regex FAQ
查看更多
登录 后发表回答