Earlier today a question was asked regarding input validation strategies in web apps.
The top answer, at time of writing, suggests in PHP
just using htmlspecialchars
and mysql_real_escape_string
.
My question is: Is this always enough? Is there more we should know? Where do these functions break down?
An important piece of this puzzle is contexts. Someone sending "1 OR 1=1" as the ID is not a problem if you quote every argument in your query:
Which results in:
which is ineffectual. Since you're escaping the string, the input cannot break out of the string context. I've tested this as far as version 5.0.45 of MySQL, and using a string context for an integer column does not cause any problems.
why, oh WHY, would you not include quotes around user input in your sql statement? seems quite silly not to! including quotes in your sql statement would render "1 or 1=1" a fruitless attempt, no?
so now, you'll say, "what if the user includes a quote (or double quotes) in the input?"
well, easy fix for that: just remove user input'd quotes. eg:
input =~ s/'//g;
. now, it seems to me anyway, that user input would be secured...When it comes to database queries, always try and use prepared parameterised queries. The
mysqli
andPDO
libraries support this. This is infinitely safer than using escaping functions such asmysql_real_escape_string
.Yes,
mysql_real_escape_string
is effectively just a string escaping function. It is not a magic bullet. All it will do is escape dangerous characters in order that they can be safe to use in a single query string. However, if you do not sanitise your inputs beforehand, then you will be vulnerable to certain attack vectors.Imagine the following SQL:
You should be able to see that this is vulnerable to exploit.
Imagine the
id
parameter contained the common attack vector:There's no risky chars in there to encode, so it will pass straight through the escaping filter. Leaving us:
Which is a lovely SQL injection vector and would allow the attacker to return all the rows. Or
which produces
Which allows the attacker to return the first administrator's details in this completely fictional example.
Whilst these functions are useful, they must be used with care. You need to ensure that all web inputs are validated to some degree. In this case, we see that we can be exploited because we didn't check that a variable we were using as a number, was actually numeric. In PHP you should widely use a set of functions to check that inputs are integers, floats, alphanumeric etc. But when it comes to SQL, heed most the value of the prepared statement. The above code would have been secure if it was a prepared statement as the database functions would have known that
1 OR 1=1
is not a valid literal.As for
htmlspecialchars()
. That's a minefield of its own.There's a real problem in PHP in that it has a whole selection of different html-related escaping functions, and no clear guidance on exactly which functions do what.
Firstly, if you are inside an HTML tag, you are in real trouble. Look at
We're already inside an HTML tag, so we don't need < or > to do anything dangerous. Our attack vector could just be
javascript:alert(document.cookie)
Now resultant HTML looks like
The attack gets straight through.
It gets worse. Why? because
htmlspecialchars
(when called this way) only encodes double quotes and not single. So if we hadOur evil attacker can now inject whole new parameters
gives us
In these cases, there is no magic bullet, you just have to santise the input yourself. If you try and filter out bad characters you will surely fail. Take a whitelist approach and only let through the chars which are good. Look at the XSS cheat sheet for examples on how diverse vectors can be
Even if you use
htmlspecialchars($string)
outside of HTML tags, you are still vulnerable to multi-byte charset attack vectors.The most effective you can be is to use the a combination of mb_convert_encoding and htmlentities as follows.
Even this leaves IE6 vulnerable, because of the way it handles UTF. However, you could fall back to a more limited encoding, such as ISO-8859-1, until IE6 usage drops off.
For a more in-depth study to the multibyte problems, see https://stackoverflow.com/a/12118602/1820
In addition to Cheekysoft's excellent answer:
There isn't really a silver bullet for preventing HTML injection (e.g. cross site scripting), but you may be able to achieve it more easily if you're using a library or templating system for outputting HTML. Read the documentation for that for how to escape things appropriately.
In HTML, things need to be escaped differently depending on context. This is especially true of strings being placed into Javascript.
I would definitely agree with the above posts, but I have one small thing to add in reply to Cheekysoft's answer, specifically:
I coded up a quick little function that I put in my database class that will strip out anything that isnt a number. It uses preg_replace, so there is prob a bit more optimized function, but it works in a pinch...
So instead of using
I would use
and it would safely run the query
Sure, that just stopped it from displaying the correct row, but I dont think that is a big issue for whoever is trying to inject sql into your site ;)
Works well, even better on 64 bit systems. Beware of your systems limitations on addressing large numbers though, but for database ids this works great 99% of the time.
You should be using a single function/method for cleaning your values as well. Even if this function is just a wrapper for mysql_real_escape_string(). Why? Because one day when an exploit to your preferred method of cleaning data is found you only have to update it one place, rather than a system-wide find and replace.