I already know how XSS works, but finding out all the many different ways to inject malicious input is not an option.
I saw a couple libraries out there, but most of them are very incomplete, ineficient, or GPL licensed (when will you guys learn that GPL is not good to share little libraries! Use MIT)
In addition to zerkms's answer, if you find you need to accept user submitted HTML (from a WYSIWYG editor, for example), you will need to use a HTML parser to determine what can and can't be submitted.
I use and recommend HTML Purifier.
Note: Don't even try to use regex :)
I like htmlpurifier fine, but I see how it could be inefficient, since it's fairly large. Also, it's LGPL, and I don't know if that falls under your GPL ban.
OWASP offers an encoding library, on which time has been spent to handle the various cases.
Obsolete: http://www.owasp.org/index.php/Category:OWASP_Encoding_Project
Now at http://code.google.com/p/reform/
and OWASP's antiXSS specific library is at: http://code.google.com/p/php-antixss/
Edit: Thank you @mario for pointing that it all depends on the context. There really is no super way to prevent it all on all occasions. You have to adjust accordingly.
Edit: I stand corrected and very appreciative for both @bobince and @Rook's support on this issue. It's pretty much clear to me now that
strip_tags
will not prevent XSS attacks in any way.I've scanned all my code prior to answering to see if I was in any way exposed and all is good because of the
htmlentities($a, ENT_QUOTES)
I've been using mainly to cope with W3C.That said I've updated the function bellow to somewhat mimic the one I use. I still find
strip_tags
nice to have before htmlentities so that when a user does try to enter tags they will not pollute the final outcome. Say user entered:<b>ok!</b>
it's much nicer to show it asok!
than printing out the full text htmlentities converted.Thank you both very much for taking the time to reply and explain.
If it's coming from internet user:
If it's coming from the backoffice... don't.
There are perfectly valid reasons why someone at the company may need javascript for this or that page. It's much better to be able to log and blame than to shut down your uers.
HTMLPurifier is the undenied best option for cleansing HTML input, and htmlspecialchars should be applied to anything else.
But XSS vulnerabilities should not be cleaned out, because any such submissions are garbage anyway. Rather make your application bail and write a log entry. The best filter set to achieve XSS detection is in the mod_security core rules.
I'm using an inconspicious but quite thorough attribute detection here in new input(), see _xss method.
I'm surprised it's not been mentioned here, but I prefer htmlAwed to htmlPurifier. It's up-to-date, nicely licensed, very small and really fast.