Efficiently sanitize user entered text

I have a html form that accepts user entered text of size about 1000, and is submitted to a php page where it will be stored in mysql database. I use PDO with prepared statements to prevent sql injection. But to sanitize the text entered by user, what are the best efforts needed to do ?

I want to prevent any script injection, xss attacks, etc.

回答1:

Security is an interesting concept and attracts a lot of people to it. Unfortunately it's a complex subject and even the professionals get it wrong. I've found security holes in Google (CSRF), Facebook (more CSRF), several major online retailers (mainly SQL injection / XSS), as well as thousands of smaller sites both corporate and personal.

These are my recommendations:

1) Use parameterised queries
Parameterised queries force the values passed to the query to be treated as separate data, so that the input values cannot be parsed as SQL code by the DBMS. A lot of people will recommend that you escape your strings using mysql_real_escape_string(), but contrary to popular belief it is not a catch-all solution to SQL injection. Take this query for example:

SELECT * FROM users WHERE userID = $_GET['userid']

If $_GET['userid'] is set to 1 OR 1=1, there are no special characters and it will not be filtered. This results in all rows being returned. Or, even worse, what if it's set to 1 OR is_admin = 1?

Parameterised queries prevent this kind of injection from occuring.

2) Validate your inputs
Parameterised queries are great, but sometimes unexpected values might cause problems with your code. Make sure that you're validating that they're within range and that they won't allow the current user to alter something they shouldn't be able to.

For example, you might have a password change form that sends a POST request to a script that changes their password. If you place their user ID as a hidden variable in the form, they could change it. Sending id=123 instead of id=321 might mean they change someone else's password. Make sure that EVERYTHING is validated correctly, in terms of type, range and access.

3) Use htmlspecialchars to escape displayed user-input
Let's say your user enters their "about me" as something like this:
</div><script>document.alert('hello!');</script><div>
The problem with this is that your output will contain markup that the user entered. Trying to filter this yourself with blacklists is just a bad idea. Use htmlspecialchars to filter out the strings so that HTML tags are converted to HTML entities.

4) Don't use $_REQUEST
Cross-site request forgery (CSRF) attacks work by getting the user to click a link or visit a URL that represents a script that perfoms an action on a site for which they are logged in. The $_REQUEST variable is a combination of $_GET, $_POST and $_COOKIE, which means that you can't tell the difference between a variable that was sent in a POST request (i.e. through an input tag in your form) or a variable that was set in your URL as part of a GET (e.g. page.php?id=1).

Let's say the user wants to send a private message to someone. They might send a POST request to sendmessage.php, with to, subject and message as parameters. Now let's imagine someone sends a GET request instead:

sendmessage.php?to=someone&subject=SPAM&message=VIAGRA!

If you're using $_POST, you won't see any of those parameters, as they are set in $_GET instead. Your code won't see the $_POST['to'] or any of the other variables, so it won't send the message. However, if you're using $_REQUEST, the $_GET and $_POST get stuck together, so an attacker can set those parameters as part of the URL. When the user visits that URL, they inadvertantly send the message. The really worrysome part is that the user doesn't have to do anything. If the attacker creates a malicious page, it could contain an iframe that points to the URL. Example:

<iframe src="http://yoursite.com/sendmessage.php?to=someone&subject=SPAM&message=VIAGRA!">
</iframe>

This results in the user sending messages to people without ever realising they did anything. For this reason, you should avoid $_REQUEST and use $_POST and $_GET instead.

5) Treat everything you're given as suspicious (or even malicious)
You have no idea what the user is sending you. It could be legitimate. It could be an attack. Never trust anything a user has sent you. Convert to correct types, validate the inputs, use whitelists to filter where necessary (avoid blacklists). This includes anything sent via $_GET, $_POST, $_COOKIE and $_FILES.

If you follow these guidelines, you're at a reasonable standing in terms of security.

回答2:

You need to distinguish between two types of attacks: SQL injection and XSS. SQL injection can be avoided by using prepared statements or the quote functions of your DB library. You use the quoting function this before inserting into the database.

XSS can be avoided by quoting all special chars with htmlspecialchars. It is considered good style to escape the output after you read it from the database and store the original input in the database. This way, when you use the input in other contexts where HTML escaping is not needed (text email, JSON encoded string) you still have the original input form the user.

Also see this answer to a similar question.

回答3:

There are two simple things you need to do to be safe: