php regex to match outside of html tags

I am making a preg_replace on html page. My pattern is aimed to add surrounding tag to some words in html. However, sometimes my regular expression modifies html tags. For example, when I try to replace this text:

<a href="example.com" alt="yasar home page">yasar</a>

So that yasar reads <span class="selected-word">yasar</span> , my regular expression also replaces yasar in alt attribute of anchor tag. Current preg_replace() I am using looks like this:

preg_replace("/(asf|gfd|oyws)/", '<span class=something>${1}</span>',$target);

How can I make a regular expression, so that it doesn't match anything inside a html tag?

标签： php regex preg-replace pcre

4条回答

唯独是你

2楼-- · 2019-01-02 16:29

This might be the kind of thing that you're after: http://snipplr.com/view/3618/ In general, I'd advise against such. A better alternative is to strip out all HTML tags and instead rely on BBcode, such as:

[b]bold text[b] [i]italic text[i]

However I appreciate that this might not work well with what you're trying to do.

Another option may be HTML Purifier, see: http://htmlpurifier.org/

0人赞添加讨论(0) 举报

与风俱净

3楼-- · 2019-01-02 16:32

Yasar, resurrecting this question because it had another solution that wasn't mentioned.

Instead of just checking that the next tag character is an opening tag, this solution skips all <full tags>.

With all the disclaimers about using regex to parse html, here is the regex:

<[^>]*>(*SKIP)(*F)|word1|word2|word3

Here is a demo. In code, it looks like this:

$target = "word1 <a skip this word2 >word2 again</a> word3";
$regex = "~<[^>]*>(*SKIP)(*F)|word1|word2|word3~";
$repl= '<span class="">\0</span>';
$new=preg_replace($regex,$repl,$target);
echo htmlentities($new);

Here is an online demo of this code.

Reference

0人赞添加讨论(0) 举报

长期被迫恋爱

4楼-- · 2019-01-02 16:36

From top of my mind, this should be working:

echo preg_replace("/<(.*)>(.*)<\/(.*)>/i","<$1><span class=\"some-class\">$2</span></$3>",$target);

But, I don't know how safe this would be. I am just presenting a possibility :)

0人赞添加讨论(0) 举报

不流泪的眼

5楼-- · 2019-01-02 16:41

You can use an assertion for that, as you just have to ensure that the searched words occur somewhen after an >, or before any <. The latter test is easier to accomplish as lookahead assertions can be variable length:

/(asf|foo|barr)(?=[^>]*(<|$))/

See also http://www.regular-expressions.info/lookaround.html for a nice explanation of that assertion syntax.

0人赞添加讨论(0) 举报

php regex to match outside of html tags

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间