Please read this statement carefully: let's assume before ANY elements are added to the document
all unsafe elements in $dom have been removed. But they were initially created. Ok let's continue....
If a piece of user text is processed and can possiblity be loaded like so:
var comment = 'I\'m a naughty person!!' +
'<script src="http://blah.com/some_naughty_javascript.js">';
var $dom = $('<div>' + comment + '</div>');
Is this by itself dangerous in any way? My point being, can just the simple act of creating a DOM somehow inject anything, or is it just simply processed and the structure is created?
For example:
var $dom = $('<script>alert("hi");</script>');
Obviously the message hi does not pop up until it's added to the document
. But:
- Can any tag or anything created in this manner be dangerous?
- Can any functions in javascript/jquery "watch" for elements being created in this manner and act on it BEFORE it's been stripped of bad elements and put on document?
Bounty Edit
So as outlined in the answers below, it seems this method isn't very safe, particularly for one reason:
var $dom = $('<img src="blah.jpg"/>')
-- this will request for the image straight away, regardless of if the object was added to the document.
This creates a major problem for dealing with HTML ajax requests. For example if we wanted to get the values from the inputs of the form:
$.ajax({
url: 'test.php',
success: function(responseHTML) {
var inputs = $(responseHTML).find('form input');
}
});
This will involuntarily cause all images to be requested for by the browser.
Bounty is awarded to anyone:
- Who can provide a nice, safe way of dealing with ajax requests without the above issue.
- Ideally doesn't provide a regex answer... i.e. what if we wanted to do
$(responseHTML).find('img')
-- removing image tags with regex can't be an option, so an unobtrusive way would be needed to stop the src from loading, but still have the same attributes, structure, etc.
Is this by itself dangerous in any
way? My point being, can just the
simple act of creating a DOM somehow
inject anything, or is it just simply
processed and the structure is
created?
Simply creating an element without appending it to the dom will not cause any script to run since it is purely an object at this point (HtmlScriptElement). When it is actually appended to the dom the script element will be evaluated and ran by the browser. With that being said I suppose it is possible that an extremely crafty person could exploit a bug that is present in some framework or browser you might be using to cause an undesired outcome.
Consider this example:
<p>
<input type="button" value="Store 'The Script' In Variable" id="store"/>
<input type="button" value="Append 'The Script' To Dom" id="append"/>
</p>
<br/>
<p>
<input type="button" value="Does nothing"/>
</p>
<h1>The Script</h1>
<pre id="script">
$(function(){
function clickIt(){
$(this).clone().click(clickIt).appendTo("body");
}
$("input[type='button']").val("Now Does Something").click(clickIt);
});
</pre>
var theScript;
$("#store").click(function() {
theScript = document.createElement('script');
var scriptText = document.createTextNode($("#script").text());
theScript.appendChild(scriptText);
});
$("#append").click(function() {
var head = document.getElementsByTagName('head')[0];
head.appendChild(theScript);
});
When you click on store
it will create the HtmlScriptElement and store it into a variable. You will notice that nothing is ran even though the object is created. As soon as you click append
the script is appended to the dom and immediately evaluated and the buttons do something different.
Code Example on jsfiddle
Can any functions in javascript/jquery "watch" for elements
being created in this manner and act
on it BEFORE it's been stripped of bad
elements and put on document?
jQuery sort of does that for you already as it does some internal script eval
From Karl Swedberg post on .append()
All of jQuery's insertion methods use
a domManip function internally to
clean/process elements before and
after they are inserted into the DOM.
One of the things the domManip
function does is pull out any script
elements about to be inserted and run
them through an "evalScript routine"
rather than inject them with the rest
of the DOM fragment. It inserts the
scripts separately, evaluates them,
and then removes them from the DOM.
...
You could alter the behavior of jQuery to remove all <script/>
and sanitize other elements with inline javascript onclick, mouseover, etc
when calling append()
however that will only affect jQuery as someone could easily use vanilla javascript to append the <script/>
element.
Dom Mutation Events
Dom Level 2 does defined some Dom mutation events to capture elements that are added to the dom one would look towards the event, DOMNodeInserted. However it is fired after the element has already been added. note, per Raynos these are currently deprecated.
DOMNodeInserted Fired when a node has
been added as a child of another node.
This event is dispatched after the
insertion has taken place. The target
of this event is the node being
inserted. Bubbles: Yes Cancelable: No
Context Info: relatedNode holds the
parent node
In the end it appears there is no totally stop a <script/>
being appended to the dom via some other javascript. (at least not that I can find).
The best way I can suggest is to never ever trust user input as all user input is evil. When you do dom manipulation double check to make sure there are no forbidden tags, be it <script/>
or even plain <p/>
elements and sanitize all input before it is persisted.
Also as John points out you need to worry about any element that can attach a onclick
event or any inline javascript event handler.
An obligatory response to your first example
var comment = 'I\'m a naughty person!!' +
'<script src="http://blah.com/some_naughty_javascript.js">';
var $dom = $('<div>' + comment + '</div>');
Don’t do this. Instead, you should use an API which treats text as text and doesn’t expose you to injection at all. In this example, you should do this:
var $dom = $('<div>').text(comment);
…which creates a div
, then sets its text content. comment
is never parsed as HTML, so there is no chance that the browser will ever do anything dangerous with it.
Moving on to the question
If I understand you correctly, you’d like to parse information out of arbitrary HTML, without the browser getting ready to display it (say, by loading images).
This is tricky, because the DOMs built into web browsers are built to deal with content which will, at some point, be displayed. jQuery (and any library that creates DOM nodes) is limited by this.
DOM Level 2 defines an API to create documents which are completely separate from the active one: document.implementation.createHTMLDocument(title)
. In my testing, nothing is loaded when an img
is created on one of these documents:
var doc = document.implementation.createHTMLDocument(''),
img = doc.createElement('img');
img.src = 'http://example.com/image.jpg'; // Nothing happens.
// Alternatively…
doc.body.innerHTML = '<img src="http://example.com/image.jpg">'; // Nope.
So, a document created this way appears to be a nice sandbox to parse and explore HTML. You can even create a jQuery wrapper around nodes in a different document ($(doc.body)
), and explore it through jQuery APIs. When you find the nodes you’re looking for, you can convert them back to HTML for insertion into the active document, or use methods like importNode()
and adoptNode()
to transfer them directly into the active document.
Unfortunately, support for all of this is new. Firefox supports createHTMLDocument
in version 4 and up (a similar createDocument
method, which deals with XML, is available in older versions), and Internet Explorer supports it in version 9 and up. Also, as far as I can tell, the spec does not guarantee that images and scripts won’t be preloaded on these documents.
A better solution is to avoid the browser’s HTML parser. A number of JavaScript HTML
parsers have shown up lately. The simplest is probably John Resig’s Pure JavaScript HTML Parser. You can feed it HTML, and it triggers callbacks when it hits new tags, attributes, and text. From these callbacks, you can create new HTML, build DOM nodes or store the document in whatever form you like — and you can ignore attributes and nodes that you consider dangerous.
You can find an example of this in Dan Kaminsky’s Interpolique, a proof-of-concept which aimed to kill XSS and SQL injection once and for all. The project hasn’t taken off, but if you download Interpolique, you’ll find a safeParse()
function stuck at the bottom of htmlparser.js, which uses a whitelist of tag names and attributes, and throws away everything else.
jsdom is a complete (up to DOM Level 2, with some Level 3) HTML DOM written in JavaScript — you could use it to work with HTML safely. You can even load up its own copy of jQuery. However, it’s written for CommonJS, without browser compatibility in mind. I don’t know if it will work in most web browsers without modification. It’s also a big library.
If at all possible, the ideal solution is to serve AJAX responses in a format other than HTML. Do you need to include the extra, unsafe HTML at all? If you do the work on the server of returning only what you’ll need…
{
"inputs": [
'<input …>',
'<input …>'
],
}
…your job on the client side becomes much easier.
Excellent question. It appears it is possible to inject script and place event handlers inside it. I've tested using the following HTML:
<!DOCTYPE html>
<html lang="en">
<head>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js"></script>
<script type="text/javascript">
<!--
$(function() {
var $dom = $('<script>$(".testbutton").live("click", function() { alert("hi") });</script>');
$(".firstbutton").click(function() {
$("BODY").append($dom);
});
});
-->
</script>
</head>
<body style="padding:0">
<button class="firstbutton">Click this first</button>
<button class="testbutton">Then this</button>
</body>
</html>
You can see that the second button has no effect until the first button has been clicked and the script tag added to the DOM.
This could possibly be open to abuse if user form input is taken and dynamically inserted into the page. In which case data sanitisation should definately be used.
This is something I'd never thought of - thanks for raising it.
It seems that the script will not work as long as it is not appended to DOM.
$(function ()
{
var ss = document.createElement('script');
var scr = 'alert("bah");';
var tt = document.createTextNode(scr);
ss.appendChild(tt);
var hh = document.getElementsByTagName('head')[0];
//hh.appendChild(ss);
});
And
$(function ()
{
var ss = document.createElement('script');
var scr = 'alert("bah");';
var tt = document.createTextNode(scr);
ss.appendChild(tt);
var hh = document.getElementsByTagName('head')[0];
hh.appendChild(ss);
});