I'm trying to use an html file as a template for a pdf, but Flying Saucer isn't recognizing the HTML5 entities (&trade,   etc). If I replace them with their hex values, then the program runs fine.
My code is as follows:
public static InputStream create(String content) throws PDFUtilException {
try (ByteArrayOutputStream baos = new ByteArrayOutputStream();) {
ITextRenderer iTextRenderer = new ITextRenderer();
iTextRenderer.getSharedContext()
.setReplacedElementFactory(new MediaReplacedElementFactory(iTextRenderer.getSharedContext()
.getReplacedElementFactory()));
iTextRenderer.setDocumentFromString(closeOutTags(content), null);
iTextRenderer.layout();
iTextRenderer.createPDF(baos);
return new ByteArrayInputStream(baos.toByteArray());
} catch (IOException | DocumentException e) {
throw new PDFUtilException("Unable to create PDF", e);
}
}
Thanks,
Oliver
Michael is correct in saying that Flying Saucer needs well-formed XML, but if your only problem are predefined HTML entities (which aren't part of XML), then you can declare them yourself at the begin of your document like so:
This pulls-in the entity declarations from their official URL into the
htmlentities
parameter entity, then references (eg. "executes") the pulled-in declarations. If you only needtrade
andnbsp
, or if Flying Saucer won't allow you to access URLs from the net, you can declare them manually as well:Now if you actually have a proper HTML (not XHTML) file, then you won't be able to use an XML processor directly with it, because HTML uses markup features not supported by XML (for example, empty elements such as the
img
element, omitted tags, and attribute shortforms). But you can use an SGML processor to first convert HTML to XHTML (XML), and then use Flying Saucer on the result XML file (SGML is the superset of both HTML and XML, and the original markup language on which HTML and XML are based). The process involves using an HTML DTD grammar such as the original W3C HTML4 DTD (from 1999) or my HTML5 DTD on sgmljs.net plus an SGML processor. Before going into details, though, first check if merely adding entity declarations as already described solves your problem.I've never heard of Flying Saucer until today but the first sentence of the documentation says "Flying Saucer is a pure-Java library for rendering arbitrary well-formed XML (or XHTML)" which suggests rather strongly that it expects well-formed XML input, rather than HTML.