When a webserver claims ContentType: text/plain in an HTTP response can the client assume newlines are '\n', or '\r\n', something else, or should it allow both?
What standards specify? I am lost and confused among the standards. RFC 2046 appears to define the 'plain' subtype, but there refers to RFC 822.
I've skimmed RFC 822 but I'm confused about whether it is saying CRLF (\r\n) is explicitly not allowed (in the message body), or whether CRLF should implicitly be allowed because any ASCII character is legal after the blank line?
RFC 5322 defines the 'internet message format' and I'm not sure if that applies to HTTP (it seems intended for email), but it specifically says the ONLY CR or LF in the message body you should see is the CRLF combination..?
RFC 2046 section 4.1.1 says:
"The canonical form of any MIME "text" subtype MUST always represent a line break as a CRLF sequence. Similarly, any occurrence of CRLF in MIME "text" MUST represent a line break. Use of CR and LF outside of line break sequences is also forbidden."
To be honest though, if you're using this for parsing or display purposes I wouldn't rely on it. Most webservers are going to set the content-type from the file extension, so any Unixy file with a .txt extension is going to get the text/plain content-type (illegally, as far as the paragraph above is concerned).