Simple HTML sanitizer in Javascript

2019-01-14 06:44发布

问题:

I'm looking for a simple HTML sanitizer written in JavaScript. It doesn't need to be 100% XSS secure.

I'm implementing Markdown and the WMD Markdown editor (The SO master branch from github) on my website. The problem is that the HTML shown in the live preview isn't filtered, like it here on SO. I am looking for a simple/quick HTML sanitizer written in JavaScript so that i can filter the contents of the preview window.

No need for a full parser with complete XSS protection. I'm not sending the output back to the server. I'm sending the Markdown to the server where I use a proper, full HTML sanitizer before I store the result in the database.

Google is being absolutely useless to me. I just get hundreds of (often incorrect) articles on how to filter out javascript from user generated HTML in all kinds of server-side languages.

UPDATE

I'll explain a bit better why I need this. My website has an editor very similar to the one here on StackOverflow. There's a text area to enter MarkDown syntax and a preview window below it that shows you how it will look like after you submitted it.

When the user submits something, it is sent to the server in MarkDown format. The server converts it to HTML and then runs a HTML sanitizer on it to clean up the HTML. MarkDown allows arbitrary HTML so I need to clean it up. For example, the user types something like this:

<script>alert('Boo!');</script>

The MarkDown converter does not touch it since it's HTML. The HTML sanitizer will strip it so the script element is gone.

But this is not what happens in the preview window. The preview window only converts MarkDown to HTML but does not sanitize it. So, the preview window will have a script element.This means the preview window is different from the actual rendering on the server.

I want to fix this, so I need a quick-and-dirty JavaScript HTML sanitizer. Something simple with basic element/attribute blacklisting and whitelisting will do. It does not need to be XSS safe because XSS protection is done by the server-side HTML sanitizer.

This is just to make sure the preview window will match the actual rendering 99.99% of the time, which is good enough for me.

Can you help? Thanks in advance!

回答1:

You should have a look at the one recommended in this question Sanitize/Rewrite HTML on the Client Side

And just to be sure that you don't need to do more about XSS, please review the answers to this one How to prevent Javascript injection attacks within user-generated HTML



回答2:

for my function I've only cared that the string is not empty and contains only alphanumeric characters. This uses plain JS and no third libraries or anything. It contains a long regex, but it does the job ;) You could build on this but have your regex be something more alike '< script >|< /script >' (with characters escaped where necessary, and minus the spaces). ;)

    var validateString = function(string) {

      var validity = true;

      if( string == '' ) { validity = false; }

      if( string.match( /[ |<|,|>|\.|\?|\/|:|;|"|'|{|\[|}|\]|\||\\|~|`|!|@|#|\$|%|\^|&|\*|\(|\)|_|\-|\+|=]+/ ) != null ) {

          validity = false;
      }

      return validity;
    }