How to strip tags in a safer way than using strip_

I'm having some problems using strip_tags PHP function when the string contains 'less than' and 'greater than' signs. For example:

If I do:

strip_tags("<span>some text <5ml and then >10ml some text </span>");

I'll get:

some text 10ml some text

But, obviously I want to get:

some text <5ml and then >10ml some text

Yes I know that I could use < and >, but I don't have chance to convert those characters into HTML entities since data is already stored as you can see in my example.

What I'm looking for is a clever way to parse HTML in order to get rid only actual HTML tags.

Since TinyMCE was used for generate that data, I know which actual html tags could be used in any case, so a strip_tags($string, $black_list) implementation would be more usefull than strip_tags($string, $allowable_tags).

Any thoughs?

标签： php dom html-parsing strip-tags

3条回答

戒情不戒烟

2楼-- · 2019-02-22 02:15

Instead of strip_tags(), just use htmlspecialchars() instead.

http://php.net/manual/en/function.htmlspecialchars.php

0人赞添加讨论(0) 举报

smile是对你的礼貌

3楼-- · 2019-02-22 02:16

If you want to have "greater than" and "lesser than" signs, you need to escape them:

> is >

< is <

See e.g. this: http://www.w3schools.com/html/html_entities.asp

0人赞添加讨论(0) 举报

我欲成王，谁敢阻挡

4楼-- · 2019-02-22 02:22

As a wacky workaround you could filter non-html brackets with:

$html = preg_replace("# <(?![/a-z]) | (?<=\s)>(?![a-z]) #exi", "htmlentities('$0')", $html);

Apply strip_tags() afterwards. Note how this only works for your specific example and similar cases. It's a regular expression with some heuristics, not artificial intellegince to discern html tags from unescaped angle brackets with other meaning.

0人赞添加讨论(0) 举报

How to strip tags in a safer way than using strip_

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间