Using safe filter in Django for rich text fields

2020-08-14 09:46发布

问题:

I am using TinyMCE editor for textarea fileds in Django forms.

Now, in order to display the rich text back to the user, I am forced to use the "safe" filter in Django templates so that HTML rich text can be displayed on the browser.

Suppose JavaScript is disabled on the user's browser, TinyMCE won't load and the user could pass <script> or other XSS tags from such a textarea field. Such HTML won't be safe to display back to the User.

How do I take care of such unsafe HTML Text that doesn't come from TinyMCE?

回答1:

You are right to be concerned about raw HTML, but not just for Javascript-disabled browsers. When considering the security of your server, you have to ignore any work done in the browser, and look solely at what the server accepts and what happens to it. Your server accepts HTML and displays it on the page. This is unsafe.

The fact that TinyMce quotes HTML is a false security: the server trusts what it accepts, which it should not.

The solution to this is to process the HTML when it arrives, to remove dangerous constructs. This is a complicated problem to solve. Take a look at the XSS Cheat Sheet to see the wide variety of inputs that could cause a problem.

lxml has a function to clean HTML: http://lxml.de/lxmlhtml.html#cleaning-up-html, but I've never used it, so I can't vouch for its quality.



回答2:

Use django-bleach. This provides you with a bleach template filter that allows you to filter out just the tags you want:

{% load bleach_tags %}
{{ mymodel.my_html_field|bleach }}

The trick is to configure the editor to produce the same tags as you're willing to 'let through' in your bleach settings.

Here's an example of my bleach settings:

# Which HTML tags are allowed
BLEACH_ALLOWED_TAGS = ['p', 'h3', 'h4', 'em', 'strong', 'a', 'ul', 'ol', 'li', 'blockquote']
# Which HTML attributes are allowed
BLEACH_ALLOWED_ATTRIBUTES = ['href', 'title', 'name']
BLEACH_STRIP_TAGS = True

Then you can configure TinyMCE (or whatever WYSIWYG editor you're using) only to have the buttons that create the allowed tags.



回答3:

You can use the template filter "removetags" and just remove 'script'.

Note that removetags has been removed from Django 2.0. Here is the deprecation notice from the docs:

Deprecated since version 1.8: removetags cannot guarantee HTML safe output and has been deprecated due to security concerns. Consider using bleach instead.



回答4:

There isn't a good answer to this one. TinyMCE generates HTML, and django's auto-escape specifically removes HTML.

The traditional solution to this problem has been to either use some non-html markup language in the user input side (bbcode, markdown, etc.) or to whitelist a limited number of HTML tags. TinyMCE/HTML are generally only appropriate input solutions for more or less trusted users.

The whitelist approach is tricky to implement without any security holes. The one thing you don't want to do is try to just detect "bad" tags - you WILL miss edge cases.