preg_replace words not inside a url

2019-08-19 11:27发布

I am using preg_replace to replace a list of words in a text that may contain some urls. The problem is that I don't want to replace these words if they're part of a url.

These examples should be ignored:

foo.com

foo.com/foo

foo.com/foo/foo

For a basic example (written in php), I tried to ignore strings containing .com and optional slashes and chars, using a negative look ahead assertion, but with no success:

preg_replace("/(\b)foo(\b)/", "$1bar$2(?!(\w+\.\w+)*(\.com)([\.\/]\w+)*)", $text);

This call works just ignores the word before .com. Any help would be really appreciated.

1条回答
疯言疯语
2楼-- · 2019-08-19 11:47

In cases like these, its much easier to think of the problem inverted. You want to match words not in an url. Instead think, you want to match the url and the words. So, your expression would look like this: url_match_here|(?:my|words|here). This will allow the regex engine to consume the URL first and then try to match those words. Thus, you never have to worry about matching the words inside an URL. If you want to maintain the text structure, you can use preg_replace, with the following expression (url_match_here)|(?:my|words|here) and replace by \1 to preserve the URL and the text.

I hope this helps.

Good luck.

查看更多
登录 后发表回答