I've often wondered -- why use a whitelist as opposed to a blacklist when sanitizing HTML input?
How many sneaky HTML tricks are there to open XSS vulnerabilities? Obviously script tags and frames are not allowed, and a whitelist would be used on the fields in HTML elements, but why disallow most of everything?
Because then you are sure that you don't miss anything. By explicitly allowing some tags you have obviously more control about what is allowed.
Whitelists are used in most security related topics. Think about firewalls. The first rule is to block any (incoming) traffic and then only open ports that are supposed to be open. This makes it far more secure.
The more you allow, the more tricks that a left for clever hackers to inject some nasty code into your webpage. That's why you want to allow as little as possible.
See Ruben van Vreeland's lecture How We Hacked LinkedIn & What Happened Next for a good introduction to XSS vulnerabilities and why you want your whitelist to be as strict as possible!
If you leave something off a whitelist, then you just break something that wasn't important enough for you to think about in the first place.
If you leave something off a blacklist, then you've opened a big security hole.
If browsers add new features, then your blacklist becomes out of date.
Just read something about that yesterday. It's in the manual of feedparser.
A snippet:
There is a serious risk if you only blacklist some elements, and forget an important one. When you whitelist some tags you know are secure, the risk is smaller in letting something in which can be abused.
I prefer to have both, I call it the "Black List with Relaxed White List" approach:
This black list acts as an on-off switch for tags/attributes in the relaxed white list.
This "Black List with Relaxed White List" approach makes it much easier to configure the sanitizing filter.
As an example, the White List can contain all html5 tags and attributes. While the Black List can contain tags & attributes to be excluded.
Because other tags can break the layout of a page. Imagine what would happen if someone injects
<style>
tag.<object>
tag is also dangerous.