I use the Java (6) XML-Api to apply a xslt transformation on a html-document from the web. This document is wellformed xhtml and so contains a valid DTD-Spec (<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
).
Now a problem occurs: Uppon transformation the XSLT-Processor tries to download the DTD and the w3-server denies this by a HTTP 503 error (due to Bandwith Limitation by w3).
How can I prevent the XSLT-Processor from downloading the dtd? I dont need my input-document validated.
Source is:
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
--
String xslt = "<?xml version=\"1.0\"?>"+
"<xsl:stylesheet version=\"1.0\" xmlns:xsl=\"http://www.w3.org/1999/XSL/Transform\">"+
" <xsl:output method=\"text\" />"+
" <xsl:template match=\"//html/body//div[@id='bodyContent']/p[1]\"> "+
" <xsl:value-of select=\".\" />"+
" </xsl:template>"+
" <xsl:template match=\"text()\" />"+
"</xsl:stylesheet>";
try {
Source xmlSource = new StreamSource("http://de.wikipedia.org/wiki/Right_Livelihood_Award");
Source xsltSource = new StreamSource(new StringReader(xslt));
TransformerFactory ft = TransformerFactory.newInstance();
Transformer trans = ft.newTransformer(xsltSource);
trans.transform(xmlSource, new StreamResult(System.out));
}
catch (Exception e) {
e.printStackTrace();
}
I read the following quesitons here on SO, but they all use another XML-Api:
Thanks!
You need to be using javax.xml.parsers.DocumentBuilderFactory
Try setting a feature in your DocumentBuilderFactory:
Right now I'm experiencing the same problems inside XSLT(2) when calling the document function to analyse external XHTML-pages.
I recently had this issue while unmarshalling XML using JAXB. The answer was to create a SAXSource from an XmlReader and InputSource, then pass that to the JAXB UnMarshaller's unmarshal() method. To avoid loading the external DTD, I set a custom EntityResolver on the XmlReader.
As written, this custom entity resolver will throw an exception if it's ever asked to resolve an entity OTHER than the one you want it to resolve. If you just want it to go ahead and load the remote entity, remove the "throws" line.
The previous answers led me to a solution but is wasn't obvious for me so here is a complete one:
if you use
you can try disable the dtd validation with the fllowing code: