Why does Firefox and Chrome replace the LF character with CR+LF during POST?
I wrote the following as a test:
<html>
<head>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.js"></script>
<script type="text/javascript">
function lftest()
{
var linefeed = "before";
linefeed += String.fromCharCode(10); //linefeed
linefeed += "after";
$("#field").val(linefeed);
$("#formthing").submit();
}
</script>
</head>
<body>
<form id="formthing" method="post" action="http://someurl.com/resource">
<input type="hidden" id="field" value="" name="line" />
<a href="#" onclick="lftest()">send</a>
</form>
</body>
</html>
The developer tools network tab shows the POST data:
before%0D%0Aafter
Turns out that this has to do with the x-www-form-urlencoded encoding type. According to the spec:
Non-alphanumeric characters are replaced by '%HH', a percent sign and
two hexadecimal digits representing the ASCII code of the character.
Line breaks are represented as "CR LF" pairs (i.e., '%0D%0A').
edit — be sure to read @pepsi's insightful comment - the following is all probably bogus :-)
It's because the HTTP protocol stipulates that CR-LF is the line terminator for everything except the "entity body", which parameters aren't part of.
The more interesting question, therefore, is why it is that Firefox (and Chrome? not sure) strip out return characters from <textarea>
element values when responding to requests for the "value" property of the DOM elements, but they put the CR back when posting. That means that code that wants to do like the Stackoverflow comment "character counter" behavior must take into account the fact that the number of characters that'll be posted is not necessarily the same as the number of characters in the "value" property value.
Finally, it's also interesting to note that jQuery normalizes browser behavior and makes sure that the ".val()" response for <textarea>
elements always has no CR characters, making it uniformly wrong for all browsers :-)
edit — actually upon studying the RFC it might be the case that in a POST request the parameters section are to be considered the "entity body". If so, the browsers are converting to CR-LF probably just to be conservative. Servers are supposed to be really flexible with line termination conventions, but maybe 10 years ago they weren't, and browsers just did the simple thing and sent a normalized CR-LF pair to be safe.
Also note that IE has always done this too, sort-of, but the difference is that the value of a <textarea>
in IE is always reported with CR characters intact.