I am using preg_replace to replace a list of words in a text that may contain some urls. The problem is that I don't want to replace these words if they're part of a url.
These examples should be ignored:
foo.com
foo.com/foo
foo.com/foo/foo
For a basic example (written in php), I tried to ignore strings containing .com and optional slashes and chars, using a negative look ahead assertion, but with no success:
preg_replace("/(\b)foo(\b)/", "$1bar$2(?!(\w+\.\w+)*(\.com)([\.\/]\w+)*)", $text);
This call works just ignores the word before .com. Any help would be really appreciated.
In cases like these, its much easier to think of the problem inverted. You want to match words not in an url. Instead think, you want to match the url and the words. So, your expression would look like this:
url_match_here|(?:my|words|here)
. This will allow the regex engine to consume the URL first and then try to match those words. Thus, you never have to worry about matching the words inside an URL. If you want to maintain the text structure, you can usepreg_replace
, with the following expression(url_match_here)|(?:my|words|here)
and replace by\1
to preserve the URL and the text.I hope this helps.
Good luck.