When encoding possibly unsafe data, is there a reason to encode >
?
- It validates either way.
- The browser interprets the same either way, (In the cases of
attr="data"
,attr='data'
,<tag>data</tag>
)
I think the reasons somebody would do this are
- To simplify regex based tag removal.
<[^>]+>?
(rare) - Non-quoted strings
attr=data
. :-o (not happening!) - Aesthetics in the code. (so what?)
Am I missing anything?
Encoding html chars is always a delicate job. You should always encode what needs to be encoded and always use standards. Using double quotes is standard, and even quotes inside double quotes should be encoded. ENCODE always. Imagine something like this
Probably the img> will be parsed from the browser as an image tag. Browsers always try to resolve unclosed tags or quotes. As basile says use standards, otherwise you could have unexpected results without understanding the source of errors.
The HTML4 specification in its section 5.3.2 says that
so I believe you should encode the greater
>
sign as>
(because you should obey the standards).Yes, because if signs were not encoded, this allows xss on forms social media and many other because a attacker can use
<script>
tag. If you parse the signs the browser would not execute it but instead show the sign.Current browsers' HTML parsers have no problems with uquoted
>
sHowever, unfortunately, using regular expressions to "parse" HTML in JS is pretty common. (example: Ext.util.Format.stripTags). Also poorly written command line tools, IDEs, or Java classes etc. may not be sophisticated enough to determine the limiter of an opening tag.
So, you may run into problems with code like this:
(Note how the syntax highlighter treats this snippet!)
Always
This is to prevent XSS injections (through users using any of your forms to submit raw HTML or javascript). By escaping your output, the browser knows not to parse or execute any of it - only display it as text.
This may feel like less of an issue if you're not dealing with dynamic output based on user input, however it's important to at least understand, if not to make a good habit.
Strictly speaking, to prevent HTML injection, you need only encode
<
as<
.If user input is going to be put in an attribute, also encode
"
as"
.If you're doing things right and using properly quoted attributes, you don't need to worry about
>
. However, if you're not certain of this you should encode it just for peace of mind - it won't do any harm.