If the following statements are true,
- All documents are served with the HTTP header
Content-Type: text/html; charset=UTF-8
.
- All HTML attributes are enclosed in either single or double quotes.
- There are no
<script>
tags in the document.
are there any cases where htmlspecialchars($input, ENT_QUOTES, 'UTF-8')
(converting &
, "
, '
, <
, >
to the corresponding named HTML entities) is not enough to protect against cross-site scripting when generating HTML on a web server?
htmlspecialchars()
is enough to prevent document-creation-time HTML injection with the limitations you state (ie no injection into tag content/unquoted attribute).
However there are other kinds of injection that can lead to XSS and:
There are no <script> tags in the document.
this condition doesn't cover all cases of JS injection. You might for example have an event handler attribute (requires JS-escaping inside HTML-escaping):
<div onmouseover="alert('<?php echo htmlspecialchars($xss) ?>')"> // bad!
or, even worse, a javascript: link (requires JS-escaping inside URL-escaping inside HTML-escaping):
<a href="javascript:alert('<?php echo htmlspecialchars($xss) ?>')"> // bad!
It is usually best to avoid these constructs anyway, but especially when templating. Writing <?php echo htmlspecialchars(urlencode(json_encode($something))) ?>
is quite tedious.
And... injection issues can happen on the client-side as well (DOM XSS); htmlspecialchars()
won't protect you against a piece of JavaScript writing to innerHTML
(commonly .html()
in poor jQuery scripts) without explicit escaping.
And... XSS has a wider range of causes than just injections. Other common causes are:
allowing the user to create links, without checking for known-good URL schemes (javascript:
is the most well-known harmful scheme but there are more)
deliberately allowing the user to create markup, either directly or through light-markup schemes (like bbcode which is invariably exploitable)
allowing the user to upload files (which can through various means be reinterpreted as HTML or XML)
Assuming you are not using older PHP versions (5.2 or so), the htmlspecialchars is "safe" (and off course taking the backend code into consideration as @Royal Bg mentions)
In older PHP versions there has been malformed UTF-8 characters which made this function vulnerable (http://www.securityfocus.com/bid/37389)
My 2 cents: just always sanitize/check your inputs by telling what is allowed, instead of just escaping everything/encoding everything
i.e. if someone must enter a telephone number, i can imagine the following characters are allowed: 0123456789()+-. and a space, but all others are just ignored / stripped out
Same would apply to addresses etc. someone specifying UTF-8 characters for dots/blocks/hearts etc. in their address must be mentally ill...
As far as i know, yes. I cant imagine a case where it doesnt avoid xss. If you want to be completely safe, use strip_tags()