I'm trying to write something in Java that receives an XML string and validates it against an XSD schema, and does automatic error handling for some simple common errors, and outputs a fixed XML string.
I've come across the SAX ErrorHandler
interface for the Validator.validate()
function, but that seems to be mostly for reporting exceptions and I can't figure out how to modify the XML from it, other than getting the line/column number which would be very tedious to fix problems.
I also found the Validator.validate()
function which has a source and a result, and returns augmented XML, which to my knowledge just fills in missing attributes that have default values, which is part of what I need to do.
But I also need something along the lines of fixing a missing start or end tag, and correcting a tag that has been misspelled by a letter, and things like that. There are so many "Handler" interfaces (ValidationHandler
, ContentHandler
, EntityResolver
) that I'm not sure which ones to look at in depth, so if someone could point me in the right direction that would be great (I don't need a detailed code example).
Also I'm not sure how the XMLReader
fits in to it all.
To deal with errors you have to implement the interface ErrorHandler
or to extend the DefaultHandler
helper class and redefine the error
method. That is the method called for validation errors. If you want to be more precise, I think that you will have to analyze the error message. I don't think SaX will give you something that makes errors easy to fix.
BTW, note that for validating against an XSD, you should not use the method setValidating
. See the code below.
The Java doc (1.7) of the setValidating
method says :
Note that "the validation" here means a validating parser as defined in the XML recommendation. In other words, it essentially just controls the DTD validation. (except the legacy two properties defined in JAXP 1.2.)
To use modern schema languages such as W3C XML Schema or RELAX NG instead of DTD, you can configure your parser to be a non-validating parser by leaving the setValidating(boolean) method false, then use the setSchema(Schema) method to associate a schema to a parser.
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
// ...
public static void main(String args[]) throws Exception {
if (args.length == 0 || args.length > 2) {
System.err.println("Usage: java Validator <doc.xml> [<schema.xsd>]");
System.exit(1);
}
SchemaFactory sf = SchemaFactory.newInstance(XMLConstants. W3C_XML_SCHEMA_NS_URI);
String xsdpath = "book.xsd";
if (args.length == 2) {
xsdpath = args[1];
}
Schema s = sf.newSchema(new File(xsdpath));
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);
factory.setSchema(s);
XMLReader parser = factory.newSAXParser().getXMLReader();
parser.setFeature("http://xml.org/sax/features/namespaces", true);
parser.setFeature("http://xml.org/sax/features/namespace-prefixes", false);
PrintStream out = new PrintStream(System.out, true, "UTF-8");
parser.setContentHandler(new MyHandler(out));
parser.setErrorHandler(new DefaultHandler());
parser.parse(args[0]);
}
}
I've used DocumentBuilderFactory
with setValidating(true)
to generate an instance of an XML validating parser (i.e. DocumentBuilder
).
Note that both validating and non-validating XML parsers will verify that the XML is "well formed" (e.g. end-tags, etc.). "Validating" refers to checking that the XML conforms to a DTD or schema.