Relevant code; barfs on instantiating the SAXSource
:
TransformerFactory factory = TransformerFactory.newInstance();
XMLReader xmlReader = XMLReaderFactory.createXMLReader("org.ccil.cowan.tagsoup.Parser");
Source input = new SAXSource(xmlReader, "http://books.toscrape.com/");
Result output = new StreamResult(System.out);
factory.newTransformer().transform(input, output);
The JavaDoc's say:
public SAXSource(XMLReader reader,
InputSource inputSource)
Create a SAXSource, using an XMLReader and a SAX InputSource. The Transformer or SAXTransformerFactory will set itself to be the reader's ContentHandler, and then will call reader.parse(inputSource).
Looking at InputSource
shows:
InputSource(InputStream byteStream)
Create a new input source with a byte stream.
InputSource(Reader characterStream)
Create a new input source with a character stream.
So this would entail, for example, a character stream to read in html
for the InputStream
??
Would tagsoup
better be used for this identity transform? But, how?