I have an ASP.NET MVC application and I'm using CKEditor for text entry. I have turned off input validation so the HTML created from CKEditor can be passed into the controller action. I am then showing the entered HTML on a web page.
I only have certain buttons on CKEditor enabled, but obviously someone could send whatever text they want down. I want to be able to show the HTML on the page after the user has entered it. How can I validate the input, but still be able to show the few things that are enabled in the editor?
So basically I want to sanitize everything except for a few key things like bold, italics, lists and links. This needs to be done server side.
See my full answer here from similar question:
I have found that replacing the angel
brackets with encoded angel brackets
solves most problems
You could create a "whitelist" of sorts for the html tags you'd like to allow. You could start by HTML encoding the whole thing. Then, replace a series of "allowed" sequences, such as:
"<strong>" and "</strong>" back to "<strong>" and "</strong>"
"<em>" and "</em>" back to "<em>" and "</em>"
"<li>" and "</li>" back to ... etc. etc.
For things like the A tag, you could resort to a regular expression (since you'd want the href attribute to be allowed too). You would still want to be careful about XSS; someone else already recommended AntiXSS.
Sample Regexp to replace the A tags:
<a href="([^"]+)">
Then replace as
<a href="$1">
Good luck!