I was hoping that you might be able to help me with a problem that I'm facing regarding JAXB.
I have the following XML file:
<root>
<prop>
<field1>
<value1>v1</value1>
<value2>v2</value2>
</field1>
<field2>
<value1>v1</value1>
<value2>v2</value2>
</field2>
</prop>
<prop>
text
<field1>
<value1>v1</value1>
<value2>v2</value2>
</field1>
</prop>
<prop>
text
</prop>
</root>
The XML can have under prop other elements (field1, field2), text or both.
And the following classes:
@XmlAccessorType(XmlAccessType.FIELD)
@XmlRootElement(name = "root")
public class Root {
protected List<Root.Element> prop;
@XmlAccessorType(XmlAccessType.FIELD)
public static class Element {
@XmlMixed
protected List<String> content;
@XmlElement
public Field1 field1;
@XmlElement
public Field2 field2;
@XmlAccessorType(XmlAccessType.FIELD)
public static class Field1 {
@XmlElement
protected String value1;
@XmlElement
protected String value2;
}
@XmlAccessorType(XmlAccessType.FIELD)
public static class Field2 {
@XmlElement
protected String value1;
@XmlElement
protected String value2;
}
}
}
I want to unmarshal the XML in to the above classes. The issue that I'm having is that in the content list I get, besides the text, other characters like newline and tab. To be more specific, based on the above XML, when I try to unmarshal I get:
- first prop with content like ["\n\t\t", "\n\t\t", "\n\t"] - it should be an empty list
- second prop with content like ["\n\t\ttext\n\t\t", "\n\t"] - it should be a list with one string
- third prop with content like ["\n\t\ttext\n\t"] - it should be an empty list
I have already tried to create and a XMLAdapter but it is applied for every element in the list, so if I remove the \n and \t and return null if it is an empty string I still get a list with some strings and some null values.
Why It's Happening
White space content in an element that has mixed context is treated as significant.
How to Fix It
You could use JAXB with StAX to support this use case. With StAX you can create a filtered
XMLStreamReader
so that any character strings that only contain white space are not reported as events. Below is an example of how you could implement it.