Ignoring DTD when parsing XML

2019-02-21 16:45发布

How can I ignore the DTD declaration when parsing file with XOM xml library. My file has the following line :

<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd">
//rest of stuff here 

And when I try to build() my document I get a filenotfound exception for the DTD file. I know I don't have this file and I don't care about it, so how can it be removed when using XOM?

Here is a code snippet:

public BlastXMLParser(String filePath) {
    Builder b = new Builder(false);
     //not a good idea to have exception-throwing code in constructor
    try {

        _document = b.build(filePath);
    } catch (ParsingException ex) {
        Logger.getLogger(BlastXMLParser.class.getName()).log(Level.SEVERE,"err", ex);
    } catch (IOException ex) {
        //
    }

private Elements getBlastReads() {
    Element root = _document.getRootElement();
    Elements rootChildren = root.getChildElements();

    for (int i = 0; i < rootChildren.size(); i++) {
        Element child = rootChildren.get(i);
        if (child.getLocalName().equals("BlastOutput_iterations")) {

            return child.getChildElements();
        }
    }

    return null;
}
}

I get a NullPointerException at this line:

Element root = _document.getRootElement();

With the DTD line removed from the source XML file I can successfully parse it, but this is not an option in the final production system.

标签: java xml xom
2条回答
Ridiculous、
2楼-- · 2019-02-21 17:07

The preferred solution would be to implement an EntityResolver that intercepts requests for the DTD and redirects these to an embedded copy. If you

  1. don't have access to the DTD and
  2. are absolutely sure you won't need it (apart from validation it might also declare character entities that are used in the document) and
  3. you are using the Xerces XML Parser implementation

you can disable fetching of DTD by setting the corresponding SAX feature. In XOM this should be possible by passing an XMLReader to the Builder constructor like this:

import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;

...

XMLReader xmlreader = XMLReaderFactory.createXMLReader();
xmlreader.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
Builder builder = new Builder(xmlreader);
查看更多
smile是对你的礼貌
3楼-- · 2019-02-21 17:07

According to their documentation this is the way to parse document without any validation.

try {
  Builder parser = new Builder();
  Document doc = parser.build("http://www.cafeconleche.org/");
}
catch (ParsingException ex) {
  System.err.println("Cafe con Leche is malformed today. How embarrassing!");
}
catch (IOException ex) {
  System.err.println("Could not connect to Cafe con Leche. The site may be down.");
}

If you do want to validate XML schema you have to call new Builder(true):

try {
  Builder parser = new Builder(true);
  Document doc = parser.build("http://www.cafeconleche.org/");
}
catch (ValidityException ex) {
  System.err.println("Cafe con Leche is invalid today. (Somewhat embarrassing.)");
}
catch (ParsingException ex) {
  System.err.println("Cafe con Leche is malformed today. (How embarrassing!)");
}
catch (IOException ex) {
  System.err.println("Could not connect to Cafe con Leche. The site may be down.");
}

Pay attention that now yet another exception can be thrown: ValidityException

查看更多
登录 后发表回答