用strip_tags禁止一些标签(strip_tags disallow some tags)

基于对strip_tags文件，第二个参数需要允许的标签。但是在我的情况，我想要做的相反。说我接受这个标签script_tags正常（缺省）接受，但剥离只有<script>标记。这个任何可能的方式？

我的意思不是有人为我编写的，而是关于如何实现这一目标（如果可能的话）是极大的赞赏可能的方式输入。

Answer 1:

编辑

要使用HTML过滤HTML.ForbiddenElements配置指令，看来你会做这样的事情：

require_once '/path/to/HTMLPurifier.auto.php';

$config = HTMLPurifier_Config::createDefault();
$config->set('HTML.ForbiddenElements', array('script','style','applet'));
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);

http://htmlpurifier.org/docs

HTML.ForbiddenElements 应设置为一个array 。我不知道是什么形式的array成员应采取：

array('script','style','applet')

要么：

array('<script>','<style>','<applet>')

或者是其他东西？

我认为这是第一种形式，没有分隔符; HTML.AllowedElements使用配置字符串有些常见的形式， TinyMCE的的valid elements的语法：

tinyMCE.init({
    ...
    valid_elements : "a[href|target=_blank],strong/b,div[align],br",
    ...
});

所以我的猜测是，它只是术语，并应提供任何属性（因为你禁止的元素......虽然有HTML.ForbiddenAttributes ，太）。但是，这是一个猜测。

我会从添加备注HTML.ForbiddenAttributes文档，还有：

警告：本指令补充%HTML.ForbiddenElements ，因此，检查出该指令为什么你应该考虑使用此指令前两次的讨论。

黑名单是不为“稳健”作为白名单，但你可以有你的理由。只是要小心，小心。

未经测试，我不知道该怎么告诉你。我会继续寻找答案，但我可能会去睡觉第一。这是非常晚。 :)

虽然我觉得你真的应该使用HTML净化器，并利用它的HTML.ForbiddenElements配置指令，我认为一个合理的选择，如果你真的想用strip_tags()是推导从黑名单白名单。换句话说，删除不想要的东西，然后用还剩下些什么。

例如：

function blacklistElements($blacklisted = '', &$errors = array()) {
    if ((string)$blacklisted == '') {
        $errors[] = 'Empty string.';
        return array();
    }

    $html5 = array(
        "<menu>","<command>","<summary>","<details>","<meter>","<progress>",
        "<output>","<keygen>","<textarea>","<option>","<optgroup>","<datalist>",
        "<select>","<button>","<input>","<label>","<legend>","<fieldset>","<form>",
        "<th>","<td>","<tr>","<tfoot>","<thead>","<tbody>","<col>","<colgroup>",
        "<caption>","<table>","<math>","<svg>","<area>","<map>","<canvas>","<track>",
        "<source>","<audio>","<video>","<param>","<object>","<embed>","<iframe>",
        "<img>","<del>","<ins>","<wbr>","<br>","<span>","<bdo>","<bdi>","<rp>","<rt>",
        "<ruby>","<mark>","<u>","<b>","<i>","<sup>","<sub>","<kbd>","<samp>","<var>",
        "<code>","<time>","<data>","<abbr>","<dfn>","<q>","<cite>","<s>","<small>",
        "<strong>","<em>","<a>","<div>","<figcaption>","<figure>","<dd>","<dt>",
        "<dl>","<li>","<ul>","<ol>","<blockquote>","<pre>","<hr>","<p>","<address>",
        "<footer>","<header>","<hgroup>","<aside>","<article>","<nav>","<section>",
        "<body>","<noscript>","<script>","<style>","<meta>","<link>","<base>",
        "<title>","<head>","<html>"
    );

    $list = trim(strtolower($blacklisted));
    $list = preg_replace('/[^a-z ]/i', '', $list);
    $list = '<' . str_replace(' ', '> <', $list) . '>';
    $list = array_map('trim', explode(' ', $list));

    return array_diff($html5, $list);
}

然后运行它：

$blacklisted = '<html> <bogus> <EM> em li ol';
$whitelist = blacklistElements($blacklisted);

if (count($errors)) {
    echo "There were errors.\n";
    print_r($errors);
    echo "\n";
} else {
    // Do strip_tags() ...
}

http://codepad.org/LV8ckRjd

所以，如果你在你不希望允许通过什么，它会给你回在HTML5的元素列表array形式，你可以再送入strip_tags()连接成一个字符串后：

$stripped = strip_tags($html, implode('', $whitelist)));

买者自负

现在，我已经kind've一起黑客攻击这一点，我知道有一些问题我还没想出来呢。例如，从strip_tags()手册页为$allowable_tags参数：

注意：
此参数不应包含空格。 strip_tags()看到标签作为之间的不区分大小写<和第一空白或> 。这意味着， strip_tags("<br/>", "<br>")返回一个空字符串。

这晚，由于某种原因，我不能完全弄清楚这是什么意思了这种方法。所以，我得想想明天。我还编在功能的HTML元素列表$html5从此元素MDN文档页面。眼尖的读者可能会注意到，所有的标签都以这种形式：

<tagName>

我不知道这将如何影响结果，我是否需要考虑到变化中使用的shorttag <tagName/>和一些，啊哈， 更古怪的变化 。而且，当然，还有更多的标签在那里。

因此，它可能不是生产做好准备。但你的想法。

Answer 2:

首先，看看其他人对这个话题说：

条<script>标记之间的一切与PHP？

和

从HTML内容中删除脚本标签

看来你有两个选择，一个是正则表达式的解决方案，无论是上面的链接给他们。第二是使用HTML过滤。

如果您正在剥离一些其他原因，而不是用户的内容环卫脚本标记，正则表达式可能是一个很好的解决方案。然而，因为每个人都有警告，它是用HTML净化器，如果你输入消毒一个好主意。

Answer 3:

PHP（5或更大）的解决方案：

如果你想删除<script>标签（或任何其他）， 并要删除标签内的内容 ，你应该使用：

选项1（简单的）：

preg_replace('#<script(.*?)>(.*?)</script>#is', '', $text);

方案2（更通用）：

<?php

$html = "<p>Your HTML code</p><script>With malicious code</script>"

$dom = new DOMDocument();

$dom->loadHTML($html);

$script = $dom->getElementsByTagName('script');

$remove = [];
foreach($script as $item)
{
  $item->parentNode->removeChild($item);
}

$html = $dom->saveHTML();

然后$html是：

"<p>Your HTML code</p>"

Answer 4:

这是我用它来带出禁标签的列表什么，可以做的包装标签内容和标签，包括内容都删除，加上修剪掉剩余的空白。

$description = trim(preg_replace([
    # Strip tags around content
    '/\<(.*)doctype(.*)\>/i',
    '/\<(.*)html(.*)\>/i',
    '/\<(.*)head(.*)\>/i',
    '/\<(.*)body(.*)\>/i',
    # Strip tags and content inside
    '/\<(.*)script(.*)\>(.*)<\/script>/i',
], '', $description));

输入例：

$description = '<html>
<head>
</head>
<body>
    <p>This distinctive Mini Chopper with Desire styling has a powerful wattage and high capacity which makes it a very versatile kitchen accessory. It also comes equipped with a durable glass bowl and lid for easy storage.</p>
    <script type="application/javascript">alert('Hello world');</script>
</body>
</html>';

输出结果：

<p>This distinctive Mini Chopper with Desire styling has a powerful wattage and high capacity which makes it a very versatile kitchen accessory. It also comes equipped with a durable glass bowl and lid for easy storage.</p>

Answer 5:

我用的是以下几点：

function strip_tags_with_forbidden_tags($input, $forbidden_tags)
{
    foreach (explode(',', $forbidden_tags) as $tag) {
        $tag = preg_replace(array('/^</', '/>$/'), array('', ''), $tag);
        $input = preg_replace(sprintf('/<%s[^>]*>([^<]+)<\/%s>/', $tag, $tag), '$1', $input);
    }

    return $input;
}

然后，你可以这样做：

echo strip_tags_with_forbidden_tags('<cancel>abc</cancel>xpto<p>def></p><g>xyz</g><t>xpto</t>', 'cancel,g');

输出'abcxpto<p>def></p>xyz<t>xpto</t>'

echo strip_tags_with_forbidden_tags('<cancel>abc</cancel> xpto <p>def></p> <g>xyz</g> <t>xpto</t>', 'cancel,g');

输出'abc xpto <p>def></p> xyz <t>xpto</t>'

文章来源: strip_tags disallow some tags