remove script tag from HTML content

2019-01-01 15:11发布

I am using HTML Purifier (http://htmlpurifier.org/)

I just want to remove <script> tags only. I don't want to remove inline formatting or any other things.

How can I achieve this?

One more thing, it there any other way to remove script tags from HTML

11条回答
心情的温度
2楼-- · 2019-01-01 16:00
$html = <<<HTML
...
HTML;
$dom = new DOMDocument();
$dom->loadHTML($html);
$tags_to_remove = array('script','style','iframe','link');
foreach($tags_to_remove as $tag){
    $element = $dom->getElementsByTagName($tag);
    foreach($element  as $item){
        $item->parentNode->removeChild($item);
    }
}
$html = $dom->saveHTML();
查看更多
忆尘夕之涩
3楼-- · 2019-01-01 16:02

This is a simplified variant of Dejan Marjanovic's answer:

function removeTags($html, $tag) {
    $dom = new DOMDocument();
    $dom->loadHTML($html);
    foreach (iterator_to_array($dom->getElementsByTagName($tag)) as $item) {
        $item->parentNode->removeChild($item);
    }
    return $dom->saveHTML();
}

Can be used to remove any kind of tag, including <script>:

$scriptlessHtml = removeTags($html, 'script');
查看更多
余欢
4楼-- · 2019-01-01 16:03

Use the PHP DOMDocument parser.

$doc = new DOMDocument();

// load the HTML string we want to strip
$doc->loadHTML($html);

// get all the script tags
$script_tags = $doc->getElementsByTagName('script');

$length = $script_tags->length;

// for each tag, remove it from the DOM
for ($i = 0; $i < $length; $i++) {
  $script_tags->item($i)->parentNode->removeChild($script_tags->item($i));
}

// get the HTML string back
$no_script_html_string = $doc->saveHTML();

This worked me me using the following HTML document:

<!doctype html>
<html>
    <head>
        <meta charset="utf-8">
        <title>
            hey
        </title>
        <script>
            alert("hello");
        </script>
    </head>
    <body>
        hey
    </body>
</html>

Just bear in mind that the DOMDocument parser requires PHP 5 or greater.

查看更多
初与友歌
5楼-- · 2019-01-01 16:05

use the str_replace function to replace them with empty space or something

$query = '<script>console.log("I should be banned")</script>';

$badChar = array('<script>','</script>');
$query = str_replace($badChar, '', $query);

echo $query; 
//this echoes console.log("I should be banned")

?>

查看更多
几人难应
6楼-- · 2019-01-01 16:08

Shorter:

$html = preg_replace("/<script.*?\/script>/s", "", $html);

When doing regex things might go wrong, so it's safer to do like this:

$html = preg_replace("/<script.*?\/script>/s", "", $html) ? : $html;

So that when the "accident" happen, we get the original $html instead of empty string.

查看更多
登录 后发表回答