Does the XML specification states that parser need

2019-07-15 07:44发布

问题:

I've stumbled in a problem handling the \line-feed and \carriage-return characters in xml. I know that, according to http://www.w3.org/TR/REC-xml/#sec-line-ends, xml processors are required to replace any "\n\r" or lone "\r" sequences with "\n".
The specification states that this has to be the behaviour for handling any "external parsed entity", does this apply to CDATA sections inside of an element as well?
thank you,

Michele

I'm sure that msxml library for example converts every \n\r" or lone "\r" sequences to "\n", regardless of their being in a cdata section or not.

回答1:

I'll quote a sentence from the section you link to (emphasis mine):

To simplify the tasks of applications, the XML processor must behave as if it normalized all line breaks in external parsed entities (including the document entity) on input, before parsing, by translating both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character.

Because the XML processor does this before parsing, it doesn't know yet which parts of the document are CDATA sections. Therefore, it will do the replacement regardless of the characters being in a CDATA section or not.

To reliably preserve these characters, they have to be written to the XML document as 
 and 
 entities.



回答2:

Yes - "\n\r" or "\r" in CDATA sections must be replaced with "\n" for a processor to be conforming. Any CDATA sections in your XML document will be part of the document entity, which is 'parsed'. You can find an example of an unparsed entity here.