I've been hunting around the net now for a few days trying to figure this out but getting conflicting answers.
Is there a library, class or function for PHP that securely sanitizes/encodes a string against XSS? It needs to be updated regularly to counter new attacks.
I have a few use cases:
Use case 1) I have a plain text field, say for a First Name or Last Name
- User enters text into field and submits the form
- Before this is saved to the database I want to a) trim any whitespace off the front and end of the string, and b) strip all HTML tags from the input. It's a name text field, they shouldn't have any HTML in it.
- Then I will save this to the database with PDO prepared statements.
I'm thinking I could just do trim()
and strip_tags()
then use a Sanitize Filter or RegEx with a whitelist of characters. Do they really need characters like ! and ? or <
>
in their name, not really.
Use case 2) When outputting the contents from a previously saved database record (or from a previously submitted form) to the View/HTML I want to thoroughly clean it for XSS. NB: It may or may not have gone through the filtering step in use case 1 as it could be a different type of input, so assume no sanitizing has been done.
Initially I though HTMLPurifier would do the job, but as it seems it is not what I need when I posed the question to their support:
Here's the litmus test: if a user submits
<b>foo</b>
should it show up as<b>foo</b>
or foo? If the former, you don't need HTML Purifier.
So I'd rather it showed up as <b>foo</b>
because I don't want any HTML displayed for a simple text field or any JavaScript executing.
So I've been hunting around for a function that will do it all for me. I stumbled across the xss_clean method used by Kohana 3.0 which I'm guessing works but it's only if you want to keep the HTML. It's now deprecated from Kohana 3.1 as they've replaced it with HTMLPurifier. So I'm guessing you're supposed to do HTML::chars()
instead which only does this code:
public static function chars($value, $double_encode = TRUE)
{
return htmlspecialchars( (string) $value, ENT_QUOTES, Kohana::$charset, $double_encode);
}
Now apparently you're supposed to use htmlentities instead as mentioned in quite a few places in Stack Overflow because it's more secure than htmlspecialchars.
- So how do I use htmlentities properly?
- Is that all I need?
- How does it protect against hex, decimal and base64 encoded values being sent from the attacks listed here?
Now I see that the 3rd parameter for the htmlentities method is the charset to be used in conversion. Now my site/db is in UTF-8, but perhaps the form submitted data was not UTF-8 encoded, maybe they submitted ASCII or HEX so maybe I need to convert it to UTF-8 first? That would mean some code like:
$encoding = mb_detect_encoding($input);
$input = mb_convert_encoding($input, 'UTF-8', $encoding);
$input = htmlentities($input, ENT_QUOTES, 'UTF-8');
Yes or no? Then I'm still not sure how to protect against the hex, decimal and base64 possible XSS inputs...
If there's some library or open source PHP framework that can do XSS protection properly I'd be interested to see how they do it in code.
Any help much appreciated, sorry for the long post!