I'm trying to parse a multi line XML attribute in Java using the classic DOM. The parsing is working just fine. However, it's destroying the line breaks so, when I render my parsed string, line breaks get replaced by simple spaces.
<string key="help_text" value="This is a multi line long
text. This should be parsed
and rendered in multiple lines" />
To get the attribute I'm using:
attributes.getNamedItem("value").getTextContent()
If I just pass a manually typed string to the render method using "\n", the text gets drawn as intended.
Any ideas?
I've used JDom for this on the past. It saves you a lot of trouble when decoding multilined attributes and really enhances XML parsing/writing on Java. JDom is also compatible with Android development and it's really tiny (only one jar file).
https://github.com/hunterhacker/jdom
According to the XML specification the XML parser MUST normalize attribute whitespace, such as replacing a line break character with a space. I.e. if you require line breaks to be preserved you cannot use an attribute value.
In general, whitespace handling in XML is a lot of trouble. In particular, the difference between CR, LF, and CRLF isn't preserved anywhere.
You might find it better to encode newlines in attributes as <br />
(that is, the encoded version of <br />
) and then decode them later.
From the XML specifcation:
3.3.3 Attribute-Value Normalization. You will see that all white spaces are normallised to single spaces:
Before the value of an attribute is passed to the application or
checked for validity, the XML processor MUST normalize the attribute
value by applying the algorithm below, or by using some other method
such that the value passed to the application is the same as that
produced by the algorithm. All line breaks MUST have been normalized
on input to #xA as described in 2.11 End-of-Line Handling, so the rest
of this algorithm operates on text normalized in this way.
Begin with a normalized value consisting of the empty string.
For each character, entity reference, or character reference in the
unnormalized attribute value, beginning with the first and continuing
to the last, do the following:
For a character reference, append the referenced character to the
normalized value.
For an entity reference, recursively apply step 3 of this algorithm to
the replacement text of the entity.
For a white space character (#x20, #xD, #xA, #x9), append a space
character (#x20) to the normalized value.
For another character, append the character to the normalized value.