In my legacy project i can see the usage of escapeHtml before string is sent to browser.
StringEscapeUtils.escapeHtml(stringBody);
I know from api doc what escapeHtml does.here is the example given:-
For example:
"bread" & "butter"
becomes:
"bread" & "butter".
My understanding is when we send the string after escaping html its the browser responsibility that converts back to original characters. Is that right?
But i am not getting why and when it is required and what happens if we send the string body without escaping html? what is the cost if we dont do escapeHtml before sending it to browser
I can think of several possibilities to explain why sometimes a string is not escaped:
EDIT - The reason for escaping is that special characters like
&
and<
can end up causing the browser to display something other than what you intended. A bare&
is technically an error in the html. Most browsers try to deal intelligently with such errors and will display them correctly in most cases. (This will almost certainly happen in your example text if the string were text in a<div>
, for instance.) However, because it is bad markup, some browsers will not work well; assistive technologies (e.g., text-to-speech) may fail; and there may be other problems.There are several cases that will fail despite the best efforts of the browser to recover from bad markup. If your sample string were an attribute value, escaping the quote marks would be absolutely required. There's no way that a browser is going to correctly handle something like:
The general rule is that any character that is not markup but might be confused as markup need to be escaped.
Note that there are several contexts in which text can appear within an html document, and they have separate requirements for escaping. Within attribute values, you need to escape quote marks and the ampersand (but not
<
). You must escape characters that have no representation in the character set of the document (unlikely if you are using UTF-8, but that's not always the case). Within text nodes, only&
and<
need to be escaped. Within href values, characters that need escaping in a url must be escaped (and sometimes doubly escaped so they are still escaped after the browser unescapes them once). Within a CDATA block, generally nothing should be escaped (at the HTML level).Finally, aside from the hazard of double-escaping, the cost of escaping all text is minimal: a tiny bit of extra processing and a few extra bytes on the network.
From my experience, all of the strings should be escaped from Html before being displayed on the page. Our current project is about managing all the Organization Units from the Active Directory, and these units could contain any special character (including Html Character). When displaying on the page, you could end up with the following code to show a record called
User <Marketing>
after the page is rendered, it will become
Which actually appears as
User
hyperlink on the page.However, if you escape the Html value before sending to the page
after the page is rendered, it will become
which appear correctly on the JSP page
Shortly, you use escaping Html characters to prevent the special input. If the input contains the Html Character, your page will appear wrong during rendering
you have to escape html or xml when there is a possibility that it might get interpreted along with the page-generated html (read jsp).
this good question also explains it.
HTML (nowadays we would better say XML) defines many so called "special" characters, which means that these characters have special meaning for browser in contrast with "normal" characters that just mean themselves. For example, string
"Hello, World!"
contains only "normal" characters and thus it literally means"Hello, World!"
for browser. String"<b>Hello, World!</b>"
, contains special characters'<'
,'>'
and'/'
, and for browser it means:typeset string "Hello, World!" in bold
instead of justtypeset "<b>Hello, World!</b>"
.Method
escapeHtml (String)
probably (I cannot tell for sure because I don't know how it is implemented) converts arbitrary string into HTML code that will instruct browser to literally typeset this string. For example,escapeHtml ("<b>Hello, World!</b>")
whill return HTML code that will be interpreted by browser astypeset "<b>Hello, World!</b>" normally
instead oftypeset string "Hello, World!" in bold
. If methodescapeHtml (String)
is implemented correctly, you should not care how HTML code produced by this method looks like. Just use it where you want to ask browser to typeset some string literally.