I have a sample mixed-content XML document (structure cannot be modified):
<items>
<item> ABC123 <status>UPDATE</status>
<units>
<unit Description="Each ">EA <saleprice>2.99</saleprice>
<saleprice2/>
</unit>
</units>
<warehouses>
<warehouse>100<availability>2987.000</availability>
</warehouse>
</warehouses>
</item>
</items>
I am attempting to use SAX
parser on this XML document, but the mixed-content elements are causing some issues. Namely, I get an empty String returned when attempting to handle the <item/>
node.
My handler:
@Override
public void startElement(final String uri,
final String localName, final String qName, final Attributes attributes) throws SAXException {
final String fixedQName = qName.toLowerCase();
switch (fixedQName) {
case "item":
prod = new Product();
//prod.setItem(content); <-- doesn't work, content is empty since element just started
break;
}
}
@Override
public void endElement(final String uri, final String localName, final String qName) throws SAXException {
final String fixedQName = qName.toLowerCase();
switch (fixedQName) {
case "item":
prod.setItem(content); // <-- doesn't work either, only returns an empty string
// end element, set item
productList.add(prod);
break;
case "status":
prod.setStatus(content);
break;
// ... etc....
}
}
@Override
public void characters(final char[] ch, final int start, final int length) throws SAXException {
content = "";
content = String.copyValueOf(ch, start, length).trim();
}
This handler works correctly for everything of interest, except the <item/>
element. It always returns an empty string.
If I add a println()
to the characters()
method to print out the content
, I can see the parser eventually does print the contents of <item/>
, however it is after it is expected (on the next additional characters()
method invocation by the parser)
Referencing http://docs.oracle.com/javase/tutorial/jaxp/sax/parsing.html, I know I should attempt to aggregate the strings returned from characters()
, however I don't see how this can be since I do need to retrieve the other element's data, and hard-coding an exception for the first element into the characters()
method seems like the wrong approach.
Howe can I use SAX
to retrieve the mixed-content <item/>
's data 'ABC123'?