I have been reading up on htmlspecialchars()
for escaping user input and user input from the database. Before anyone says anything, yes, I am filtering on db input as well as using prepared statements with bindings. I am only concerned about securing the output.
I am confused as to when to use ENT_COMPAT
, ENT_QUOTES
, ENT_NOQUOTES
. I came across the following excerpt while doing my research:
The second argument in the
htmlspecialchars()
call isENT_COMPAT
. I've used that because it's a safe default: it will also escape double-quote characters"
. You only really need to do that if you're outputting inside an HTML attribute (like<img src="<?php echo htmlspecialchars($img_path, ENT_COMPAT, 'UTF-8')">
). You could useENT_NOQUOTES
everywhere else.
I have found similar comments elsewhere as well. What is the purpose of converting single and/or double quotes for attributes yet not converting them elsewhere? The only thing I can think of is if you were adding actual html into the page for instance:
My variable is : <img src="somepic.jpg" alt="some text">
if you converted the double quotes here it would not render properly because of the escaped quotes. In the example given in the excerpt though I can't even think of an instance where any type of quote would be used.
Secondly, in this particular reference it says to use ENT_NOQUOTES
everywhere else. Why? My personal thought process is telling me to use ENT_QUOTES
everywhere and ENT_NOQUOTES
if and only if the variable is an actual html attribute that requires them.
I've done lots of searching and reading, but still confused about all of this. My main goal is to secure output to the page so there is no html, php, js manipulation happening.
Within HTML there are difference contexts where different characters are considered special. For example, within a double-quoted attribute value, a literal double quote would be interpreted as attribute value delimiter:
In such a case the double quote needs to be encoded using a character reference. Single-quoted attribute values are similar but here the first literal single quoted is considered the attribute value end delimiter.
Similar does also apply for the data context, i. e., outside a tag:
As you can see, the only character that would be considered harmful in regards of Cross-Site Scripting is
<
as it would switch to the tag open context. So this would need to be encoded using a character reference to avoid the injection of a tag.However, it is also allowed to use character references instead of the literal characters even though they are not special in the corresponding context or even at all. For example, the following are equivalent:
So only certain special characters are really required to be encoded as character references depending on the context but it doesn’t harm to encode other characters that are special in other contexts as well.
Just use
ENT_QUOTES
everywhere. PHP gives the option in case you need it, but 99% of the time you don't. Escaping the quotes unnecessarily is harmless.Because that code is just too long to keep writing everywhere wrap it in some tiny function.