I am trying to protect my website from Cross-Site Scripting (XSS) and I'm thinking of using regular expressions to validate user inputs.
Here is my question: I have a list of dangerous HTML tags...
<applet>
<body>
<embed>
<frame>
<script>
<frameset>
<html>
<iframe>
<img>
<style>
<layer>
<link>
<ilayer>
<meta>
<object>
...and I want to include them in regular expressions - is this possible? If not, what should I use? Do you have any ideas how to implement something like that?
You should encode string as HTML. Use dotNET method
There is more details http://msdn.microsoft.com/en-us/library/73z22y6h.aspx
Blacklisting as sanitization is not effective, as has already been discussed. Think about what happens to your blacklist when someone submits crafted input:
<SCRIPT>
<ScRiPt>
< S C R I P T >
<scr�ipt>
<scr<script>ipt>
(did you apply the blacklist recursively ;-) )This is not an enumeration of possible attacks, but just some examples to keep in mind about how the blacklist can be defeated. These will all render in the browser correctly.
Please read over the OWASP XSS (Cross Site Scripting) Prevention Cheat Sheet for a broad array of information. Black listing tags is not a very efficient way to do it and will leave gaps. You should filter input, sanitize before outputting to browser, encode HTML entities, and various other techniques discussed in my link.