Are these two functions overkill for sanitization?

2019-01-12 06:57发布

问题:

function sanitizeString($var)
{
    $var = stripslashes($var);
    $var = htmlentities($var);
    $var = strip_tags($var);
    return $var;
}

function sanitizeMySQL($var)
{
    $var = mysql_real_escape_string($var);
    $var = sanitizeString($var);
    return $var;
}

I got these two functions from a book and the author says that by using these two, I can be extra safe against XSS(the first function) and sql injections(2nd func). Are all those necessary?

Also for sanitizing, I use prepared statements to prevent sql injections.

I would use it like this:

$variable = sanitizeString($_POST['user_input']);
$variable = sanitizeMySQL($_POST['user_input']);

EDIT: Get rid of strip_tags for the 1st function because it doesn't do anything. Would using these two functions be enough to prevent the majority of attacks and be okay for a public site?

回答1:

It's true, but this level of escaping may not be appropriate in all cases. What if you want to store HTML in a database?

Best practice dictates that, rather than escaping on receiving values, you should escape them when you display them. This allows you to account for displaying both HTML from the database and non-HTML from the database, and it's really where this sort of code logically belongs, anyway.

Another advantage of sanitizing outgoing HTML is that a new attack vector may be discovered, in which case sanitizing incoming HTML won't do anything for values that are already in the database, while outgoing sanitization will apply retroactively without having to do anything special

Also, note that strip_tags in your first function will likely have no effect, if all of the < and > have become &lt; and &gt;.



回答2:

To be honest, I think the author of these function has either no idea what XSS and SQL injections are or what exactly the used function do.

Just to name two oddities:

  • Using stripslashes after mysql_real_escape_string removes the slashes that were added by mysql_real_escape_string.
  • htmlentities replaces the chatacters < and > that are used in strip_tags in order to identify tags.

Furthermore: In general, functions that protect agains XSS are not suitable to protect agains SQL injections and vice versa. Because each language and context hast its own special characters that need to be taken care of.

My advice is to learn why and how code injection is possible and how to protect against it. Learn the languages you are working with, especially the special characters and how to escape these.


Edit   Here’s some (probably weird) example: Imagine you allow your users to input some value that should be used as a path segment in a URI that you use in some JavaScript code in a onclick attribute value. So the language context looks like this:

  • HTML attribute value
    • JavaScript string
      • URI path segment

And to make it more fun: You are storing this input value in a database.

Now to store this input value correctly into your database, you just need to use a proper encoding for the context you are about to insert that value into your database language (i.e. SQL); the rest does not matter (yet). Since you want to insert it into a SQL string declaration, the contextual special characters are the characters that allow you to change that context. As for string declarations these characters are (especially) the ", ', and \ characters that need to be escaped. But as already stated, prepared statements do all that work for you, so use them.

Now that you have the value in your database, we want to output them properly. Here we proceed from the innermost to the outermost context and apply the proper encoding in each context:

  • For the URI path segment context we need to escape (at least) all those characters that let us change that context; in this case / (leave current path segment), ?, and # (both leave URI path context). We can use rawurlencode for this.
  • For the JavaScript string context we need to take care of ", ', and \. We can use json_encode for this (if available).
  • For the HTML attribute value we need to take care of &, ", ', and <. We can use htmlspecialchars for this.

Now everything together:

'… onclick="'.htmlspecialchars('window.open("http://example.com/'.json_encode(rawurlencode($row['user-input'])).'")').'" …'

Now if $row['user-input'] is "bar/baz" the output is:

… onclick="window.open(&quot;http://example.com/&quot;%22bar%2Fbaz%22&quot;&quot;)" …

But using all these function in these contexts is no overkill. Because although the contexts may have similar special characters, they have different escape sequences. URI has the so called percent encoding, JavaScript has escape sequences like \" and HTML has character references like &quot;. And not using just one of these functions will allow to break the context.



回答3:

You are doing htmlentities (which turns all > into &gt;) and then calling strip_tags which at this point will not accomplish anything more, since there are no tags.



回答4:

If you're using prepared statements and SQL placeholders and never interpolating user input directly into your SQL strings, you can skip the SQL sanitization entirely.

When you use placeholders, the structure of the SQL statement (SELECT foo, bar, baz FROM my_table WHERE id = ?) is send to the database engine separately from the data values which are (eventually) bound to the placeholders. This means that, barring major bugs in the database engine, there is absolutely no way for the data values to be misinterpreted as SQL instructions, so this provides complete protection from SQL injection attacks without requiring you to mangle your data for storage.



回答5:

No, this isn't overkill this is a vulnerability.

This code completely vulnerable to SQL Injection. You are doing a mysql_real_escape_string() and then you are doing a stripslashes(). So a " would become \" after mysql_real_escape_string() and then go back to " after the stripslashes(). mysql_real_escape_string() alone is best to stop sql injection. Parameterized query libraries like PDO and ADODB uses it, and Parameterized queries make it very easy to completely stop sql injection.

Go ahead test your code:

$variable = sanitizeString($_POST['user_input']);
$variable = sanitizeMySQL($_POST['user_input']);
mysql_query("select * from mysql.user where Host='".$variable."'");

What if:

$_POST['user_input'] = 1' or 1=1 /*

Patched:

mysql_query("select * from mysql.user where Host='".mysql_real_escape_string($variable)."'");

This code is also vulnerable to some types XSS:

$variable = sanitizeString($_POST['user_input']);
$variable = sanitizeMySQL($_POST['user_input']);
print("<body background='http://localhost/image.php?".$variable."' >");

What if:

$_POST['user_input']="' onload=alert(/xss/)";

patched:

$variable=htmlspecialchars($variable,ENT_QUOTES);
print("<body background='http://localhost/image.php?".$variable."' >");

htmlspeicalchars is encoding single and double quotes, make sure the variable you are printing is also encased in quotes, this makes it impossible to "break out" and execute code.



回答6:

Well, if you don't want to reinvent the wheel you can use HTMLPurifier. It allows you to decide exactly what you want and what you don't want and prevents XSS attacks and such



回答7:

I wonder about the concept of sanitization. You're telling Mysql to do exactly what you want it to do: run a query statement authored in part by the website user. You're already constructing the sentence dynamically using user input - concatenating strings with data supplied by the user. You get what you ask for.

Anyway, here's some more sanitization methods...

1) For numeric values, always manually cast at least somewhere before or while you build the query string: "SELECT field1 FROM tblTest WHERE(id = ".(int) $val.")";

2) For dates, convert the variable to unix timestamp first. Then, use the Mysql FROM_UNIXTIME() function to convert it back to a date. "SELECT field1 FROM tblTest WHERE(date_field >= FROM_UNIXTIME(".strtotime($val).")";. This is actually needed sometimes anyway to deal with how Mysql interprets and stores dates different from the script or OS layers.

3) For short and predictable strings that must follow a certain standard (username, email, phone number, etc), you can a) do prepared statements; or b) regex or other data validation.

4) For strings which wouldn't follow any real standard and which may or may not have pre- or double-escaped and executable code all over the place (text, memos, wiki markup, links, etc), you can a) do prepared statements; or b) store to and convert from binary/blob form - converting each character to binary, hex, or decimal representation before even passing the value to the query string, and converting back when extracting. This way you can focus more on just html validation when you spit the stored value back out.