I have a website related to entertainment. So, I have thought to use a new method to prevent XSS Attack. I have created the following words list
alert(, javascript, <script>,<script,vbscript,<layer>,
<layer,scriptalert,HTTP-EQUIV,mocha:,<object>,<object,
AllowScriptAccess,text/javascript,<link>, <link,<?php, <?import,
I have thought that because my site is related to entertainment, So I do not expect from a normal user (other than malicious user) to use such kind of words in his comment. So, I have decided to remove all the above comma separated words from the user submitted string. I need your advice. Do I no need to use htmlpurifier like tools after doing this?
Note: I am not using htmlspecialchars() because it will also convert the tags generated from my Rich Text Editor (CKEditor), so user formatted will be gone.
Hacks to circumvent your list aside, it's always better to use a whitelist than a blacklist.
In this case, you would already have a clear list of tags that you want to support, so just whitelist tags like
<em>
,<b>
, etc, using some HTML purifier.Why not just make a function that reverts the changes
htmlspecialchars()
made for the specific tags you want to be available, such as<b><i><a>
etc?Using a black list is a bad idea as it is simple to circumvent. For example, you are checking for and presumably removing
<script>
. To circumvent this, a malicious user can enter:your code will strip out the middle
<script>
leaving the outer<script>
intact and saved to the page.If you need to enter HTML and your users do not, then prevent them from entering HTML. You need to have a separate method, only accessible to you, for entering articles that with HTML.
This approach misunderstands what the HTML-injection problem is, and is utterly ineffective.
There are many, many more ways to put scripting in HTML than the above list, and many ways to evade the filter by using escaped forms. You will never catch all potential "harmful" constructs with this kind of naive sequence blacklisting, and if you try you will inconvenience users with genuine comments. (eg banning use of words beginning with
on
...)The correct way to prevent HTML-injection XSS is:
use
htmlspecialchars()
when outputting content that is supposed to be normal text (which is the vast majority of content);if you need to allow user-supplied HTML markup, whitelist the harmless tags and attributes you wish to allow, and enforce that using HTMLPurifier or another similar library.
This is a standard and well-understood part of writing a web application, and is not difficult to implement.
you can try with
htmlentities()
strip_tags()
mysql_real_escape_string()
or try a simple function