I'm trying to parse a String which contains XML content which conforms to the XML 1.1 spec. The XML contains character references which are not allowed in the XML 1.0 spec but which are allowed in the XML 1.1 spec (character references which translate to Unicode characters in the range U+0001–U+001F).
According the Xerces2 website, the Xerces2 parser supports parsing XML 1.1 documents. However, I cannot figure out how to tell it the XML we are trying to parse contains 1.1-compliant XML.
I'm using a DocumentBuilder to parse the XML (something like this):
public Element parseString(String xmlString) {
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = dbf.newDocumentBuilder();
InputSource source = new InputSource(new StringReader(xmlString));
// Throws org.xml.sax.SAXParseException becuase of the invalid character refs
Document doc = documentBuilder.parse(source);
return doc.getDocumentElement();
} catch (ParserConfigurationException pce) {
// Handle the error
} catch (SAXException se) {
// Handle the error
} catch (IOException ioe) {
// Handle the error
}
}
I've tried setting the XML header to indicate the XML conforms to the 1.1 spec...
xmlString = "<?xml version=\"1.1\" encoding=\"UTF-8\" ?>" + xmlString;
...but it is still parsed as 1.0 XML (still generates the invalid character reference exceptions).
How can I configure the Xerces parser to parse the XML as XML 1.1? Is there an alternative parser which provides better support for XML 1.1?