I'm using the SAX parser that comes with JDK7. I'm trying to get hold of the DOCTYPE declaration, but none of the methods in DefaultHandler
seem to be fired for it. What am I missing?
import java.io.StringReader;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class Problem {
public static void main(String[] args) throws Exception {
String xml = "<!DOCTYPE HTML><html><head></head><body></body></html>";
SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
InputSource in = new InputSource(new StringReader(xml));
saxParser.parse(in, new DefaultHandler() {
@Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
System.out.println("Element: " + qName);
}
});;
}
}
This produces:
Element: html
Element: head
Element: body
I want it to produce:
DocType: HTML
Element: html
Element: head
Element: body
How do I get the DocType?
Update: Looks like there's a DefaultHandler2
class to extend. Can I use that as a drop-in replacement?
Instead of a DefaultHander, use org.xml.sax.ext.DefaultHandler2 which has the startDTD() method.
Report the start of DTD declarations, if any. This method is intended
to report the beginning of the DOCTYPE declaration; if the document
has no DOCTYPE declaration, this method will not be invoked.
All declarations reported through DTDHandler or DeclHandler events
must appear between the startDTD and endDTD events. Declarations are
assumed to belong to the internal DTD subset unless they appear
between startEntity and endEntity events. Comments and processing
instructions from the DTD should also be reported between the startDTD
and endDTD events, in their original order of (logical) occurrence;
they are not required to appear in their correct locations relative to
DTDHandler or DeclHandler events, however.
Note that the start/endDTD events will appear within the
start/endDocument events from ContentHandler and before the first
startElement event.
However, you must also set the LexicalHandler for the XML Reader.
import java.io.StringReader;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.ext.DefaultHandler2;
public class Problem{
public static void main(String[] args) throws Exception {
String xml = "<!DOCTYPE html><hml><img/></hml>";
SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
InputSource in = new InputSource(new StringReader(xml));
DefaultHandler2 myHandler = new DefaultHandler2(){
@Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
System.out.println("Element: " + qName);
}
@Override
public void startDTD(String name, String publicId,
String systemId) throws SAXException {
System.out.println("DocType: " + name);
}
};
saxParser.setProperty("http://xml.org/sax/properties/lexical-handler",
myHandler);
saxParser.parse(in, myHandler);
}
}