I am using a contentEditable div that allows users to edit the body HTML and then post it directly to site using an AJAX request. Naturally, I have to do some security checks on it. The most obvious was ensuring that no script tags were submitted by searching for <script
in the submitted HTML. This is done after first running htmlentities
, transferring the data to another server, and then running html_entity_decode
. In addition, every tag that is opened must be closed and every tag that is closed must be opened within the user submitted HTML.
Disregarding unrelated security risks (such as SQL injection) and non-security risks (such as a user posting an inappropriate image), what are other security risks, if any, specifically linked to allowing a user to add HTML directly to a page?
To be more specific,
- Are there ways to put scripts in the page without explicitly using a script tag, OR
- Are there ways to compromise the security of a site or its users by editing the HTML without using scripts?
Javascript can be called any number of ways by using the event attributes on elements, like:
<body onload="..">
A similar question posted here recommends using HTMLPurifier instead of trying to handle this on your own.
Yes and yes.
There are A LOT of ways for users to inject scripts without script tags.
They can do it in JS handlers
<div onmouseover="myBadScript()" />
They can do it in hrefs
<a href="javascript:myBadScript()">Click me fool!!</a>
They can do it from an external source
<iframe src="http://www.myevilsite.com/mybadscripts.html" />
They can do it in ALL SORTS of ways.
I am afraid that the idea of allowing users to do this is just not a good one. Look at using Wiki markup/down instead. It'll be much safer.
Yes. There are an alarming number of ways that malicious code can be injected into your site.
Other answers have already mentioned all of the most obvious ones, but there are a lot of much more subtle ways to get in, and if you're going to accept user-submitted HTML code, you need to be aware of them all, because hackers don't just try the obvious stuff and then give up.
You need to check all event handling attributes - not just onclick
, but everything: onfocus
, onload
, even onerror
and onscroll
can be hacked.
But more importantly than that, you need to watch out for hacks that are designed to get past your validation. For example, using broken HTML to confuse your parser into thinking it's safe:
<!--<img src="--><img src=fakeimageurl onerror=MaliciousCode();//">
or
<style><img src="</style><img src=fakeimageurl onerror=DoSomethingNasty();//">
or
<b <script>ReallySneakyJavascript();</script>0
All of these could easily slip past a validator.
And don't forget that a real hack is likely to be more obfuscated than this. They'll make an effort to make it hard for you to spot, or to understand what it's doing it you do spot it.
I'll finish by recommending this site: http://html5sec.org/ which has details of a large number of attack vectors, most of which I certainly wouldn't have thought of. (the examples above all feature in the list)
Did you think about security risk from <object>
and <embed>
objects?
I'd use strip_tags()
for stripping html tags