How can I ignore the DTD declaration when parsing file with XOM xml library. My file has the following line :
<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd">
//rest of stuff here
And when I try to build() my document I get a filenotfound exception for the DTD file. I know I don't have this file and I don't care about it, so how can it be removed when using XOM?
Here is a code snippet:
public BlastXMLParser(String filePath) {
Builder b = new Builder(false);
//not a good idea to have exception-throwing code in constructor
try {
_document = b.build(filePath);
} catch (ParsingException ex) {
Logger.getLogger(BlastXMLParser.class.getName()).log(Level.SEVERE,"err", ex);
} catch (IOException ex) {
//
}
private Elements getBlastReads() {
Element root = _document.getRootElement();
Elements rootChildren = root.getChildElements();
for (int i = 0; i < rootChildren.size(); i++) {
Element child = rootChildren.get(i);
if (child.getLocalName().equals("BlastOutput_iterations")) {
return child.getChildElements();
}
}
return null;
}
}
I get a NullPointerException at this line:
Element root = _document.getRootElement();
With the DTD line removed from the source XML file I can successfully parse it, but this is not an option in the final production system.
The preferred solution would be to implement an EntityResolver that intercepts requests for the DTD and redirects these to an embedded copy. If you
you can disable fetching of DTD by setting the corresponding SAX feature. In XOM this should be possible by passing an XMLReader to the Builder constructor like this:
According to their documentation this is the way to parse document without any validation.
If you do want to validate XML schema you have to call
new Builder(true)
:Pay attention that now yet another exception can be thrown:
ValidityException