How in Django/Python can I ensure safety from WYSI

2019-06-23 16:52发布

问题:

I would like to remove vulnerabilities to XSS / JavaScript injection in a web application where users are allowed to use an editor like CKEditor which allows arbitrary HTML (and whether my specific choice of editor allows arbitrary HTML or not, blackhats will be able to submit arbitrary HTML anyway). So no JavaScript, whether SCRIPT tags, ONCLICK and family, or whatever else. The target platform is Python and Django.

What are my best options here? I am open to an implementation that would whitelist tags and attributes; that is to say I don't see it as necessary to allow a user to submit everything that you can build in HTML while only JavaScript gets removed. I am happy to have rich text with supported tag availability that can allow fairly expressive rich text. I would also be open to an editor that produces Markdown, and strip all HTML tags before the data is saved. (HTML manipulation seems simpler, but I would also consider Markdown-implemented solutions.)

I also don't consider it necessary to produce a sanitized text if instead an exception is thrown that says that a submission has failed testing. (Ergo, lowercasing the string, and searching for '<script', 'onclick', etc. might be sufficient.)

Probably my first choice in a solution, if I have the choice, would be a whitelist of tag and attribute names.

What are the best solutions, if any, that are out there?

回答1:

If you choose to use a WYSIWYG editor that produces HTML, using bleach on the server to sanitize your HTML (via whitelisting) is probably enough.

If you choose to use a markdown (or another non-html markup) editor, you will also probably save the markdown source and generate and sanitize the html (after generation!) on the server side. This allows you to keep markdown as is (with inline html etc.) as html is sanitized post rendering. However, if your client-side editor supports preview, you would also need to be very careful regarding in browser rendering when markdown is loaded from the server! Most markdown editors include client side sanitizers for this purpose.