Obtaining DOCTYPE details using SAX (JDK 7)

2019-06-01 03:00发布

问题:

I'm using the SAX parser that comes with JDK7. I'm trying to get hold of the DOCTYPE declaration, but none of the methods in DefaultHandler seem to be fired for it. What am I missing?

import java.io.StringReader;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class Problem {

    public static void main(String[] args) throws Exception {
        String xml = "<!DOCTYPE HTML><html><head></head><body></body></html>";
        SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
        InputSource in = new InputSource(new StringReader(xml));
        saxParser.parse(in, new DefaultHandler() {

            @Override
            public void startElement(String uri, String localName, String qName,
                    Attributes attributes) throws SAXException {
                System.out.println("Element: " + qName);
            }
        });;
    }
}

This produces:

Element: html
Element: head
Element: body

I want it to produce:

DocType: HTML
Element: html
Element: head
Element: body

How do I get the DocType?


Update: Looks like there's a DefaultHandler2 class to extend. Can I use that as a drop-in replacement?

回答1:

Instead of a DefaultHander, use org.xml.sax.ext.DefaultHandler2 which has the startDTD() method.

Report the start of DTD declarations, if any. This method is intended to report the beginning of the DOCTYPE declaration; if the document has no DOCTYPE declaration, this method will not be invoked.

All declarations reported through DTDHandler or DeclHandler events must appear between the startDTD and endDTD events. Declarations are assumed to belong to the internal DTD subset unless they appear between startEntity and endEntity events. Comments and processing instructions from the DTD should also be reported between the startDTD and endDTD events, in their original order of (logical) occurrence; they are not required to appear in their correct locations relative to DTDHandler or DeclHandler events, however.

Note that the start/endDTD events will appear within the start/endDocument events from ContentHandler and before the first startElement event.

However, you must also set the LexicalHandler for the XML Reader.

import java.io.StringReader;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.ext.DefaultHandler2;

public class Problem{

    public static void main(String[] args) throws Exception {
        String xml = "<!DOCTYPE html><hml><img/></hml>";
        SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
        InputSource in = new InputSource(new StringReader(xml));

        DefaultHandler2 myHandler = new DefaultHandler2(){
            @Override
            public void startElement(String uri, String localName, String qName,
                    Attributes attributes) throws SAXException {
                System.out.println("Element: " + qName);
            }

            @Override
            public void startDTD(String name,  String publicId,
            String systemId) throws SAXException {
                System.out.println("DocType: " + name);
            }
        };
        saxParser.setProperty("http://xml.org/sax/properties/lexical-handler",
                               myHandler);
        saxParser.parse(in, myHandler);
    }
}