Make DocumentBuilder.parse ignore DTD references

2019-01-01 12:57发布

问题:

When I parse my xml file (variable f) in this method, I get an error

C:\\Documents and Settings\\joe\\Desktop\\aicpcudev\\OnlineModule\\map.dtd (The system cannot find the path specified)

I know I do not have the dtd, nor do I need it. How can I parse this File object into a Document object while ignoring DTD reference errors?

private static Document getDoc(File f, String docId) throws Exception{
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    DocumentBuilder db = dbf.newDocumentBuilder();
    Document doc = db.parse(f);


    return doc;
}

回答1:

A similar approach to the one suggested by @anjanb

    builder.setEntityResolver(new EntityResolver() {
        @Override
        public InputSource resolveEntity(String publicId, String systemId)
                throws SAXException, IOException {
            if (systemId.contains(\"foo.dtd\")) {
                return new InputSource(new StringReader(\"\"));
            } else {
                return null;
            }
        }
    });

I found that simply returning an empty InputSource worked just as well?



回答2:

Try setting features on the DocumentBuilderFactory:

DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

dbf.setValidating(false);
dbf.setNamespaceAware(true);
dbf.setFeature(\"http://xml.org/sax/features/namespaces\", false);
dbf.setFeature(\"http://xml.org/sax/features/validation\", false);
dbf.setFeature(\"http://apache.org/xml/features/nonvalidating/load-dtd-grammar\", false);
dbf.setFeature(\"http://apache.org/xml/features/nonvalidating/load-external-dtd\", false);

DocumentBuilder db = dbf.newDocumentBuilder();
...

Ultimately, I think the options are specific to the parser implementation. Here is some documentation for Xerces2 if that helps.



回答3:

I found an issue where the DTD file was in the jar file along with the XML. I solved the issue based on the examples here, as follows: -

DocumentBuilder db = dbf.newDocumentBuilder();
db.setEntityResolver(new EntityResolver() {
    public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
        if (systemId.contains(\"doc.dtd\")) {
             InputStream dtdStream = MyClass.class
                     .getResourceAsStream(\"/my/package/doc.dtd\");
             return new InputSource(dtdStream);
         } else {
             return null;
         }
      }
});


回答4:

I know I do not have the dtd, nor do I need it.

I am suspicious of this statement; does your document contain any entity references? If so, you definitely need the DTD.

Anyway, the usual way of preventing this from happening is using an XML catalog to define a local path for \"map.dtd\".



回答5:

here\'s another user who got the same issue : http://forums.sun.com/thread.jspa?threadID=284209&forumID=34

user ddssot on that post says

myDocumentBuilder.setEntityResolver(new EntityResolver() {
          public InputSource resolveEntity(java.lang.String publicId, java.lang.String systemId)
                 throws SAXException, java.io.IOException
          {
            if (publicId.equals(\"--myDTDpublicID--\"))
              // this deactivates the open office DTD
              return new InputSource(new ByteArrayInputStream(\"<?xml version=\'1.0\' encoding=\'UTF-8\'?>\".getBytes()));
            else return null;
          }
});

The user further mentions \"As you can see, when the parser hits the DTD, the entity resolver is called. I recognize my DTD with its specific ID and return an empty XML doc instead of the real DTD, stopping all validation...\"

Hope this helps.



回答6:

Source XML (With DTD)

<!DOCTYPE MYSERVICE SYSTEM \"./MYSERVICE.DTD\">
<MYACCSERVICE>
   <REQ_PAYLOAD>
      <ACCOUNT>1234567890</ACCOUNT>
      <BRANCH>001</BRANCH>
      <CURRENCY>USD</CURRENCY>
      <TRANS_REFERENCE>201611100000777</TRANS_REFERENCE>
   </REQ_PAYLOAD>
</MYACCSERVICE>

Java DOM implementation for accepting above XML as String and removing DTD declaration

public Document removeDTDFromXML(String payload) throws Exception {

    System.out.println(\"### Payload received in XMlDTDRemover: \" + payload);

    Document doc = null;
    DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
    try {

        dbf.setValidating(false);
        dbf.setNamespaceAware(true);
        dbf.setFeature(\"http://xml.org/sax/features/namespaces\", false);
        dbf.setFeature(\"http://xml.org/sax/features/validation\", false);
        dbf.setFeature(\"http://apache.org/xml/features/nonvalidating/load-dtd-grammar\", false);
        dbf.setFeature(\"http://apache.org/xml/features/nonvalidating/load-external-dtd\", false);

        DocumentBuilder db = dbf.newDocumentBuilder();

        InputSource is = new InputSource();
        is.setCharacterStream(new StringReader(payload));
        doc = db.parse(is); 

    } catch (ParserConfigurationException e) {
        System.out.println(\"Parse Error: \" + e.getMessage());
        return null;
    } catch (SAXException e) {
        System.out.println(\"SAX Error: \" + e.getMessage());
        return null;
    } catch (IOException e) {
        System.out.println(\"IO Error: \" + e.getMessage());
        return null;
    }
    return doc;

}

Destination XML (Without DTD)

<MYACCSERVICE>
   <REQ_PAYLOAD>
      <ACCOUNT>1234567890</ACCOUNT>
      <BRANCH>001</BRANCH>
      <CURRENCY>USD</CURRENCY>
      <TRANS_REFERENCE>201611100000777</TRANS_REFERENCE>
   </REQ_PAYLOAD>
</MYACCSERVICE>