I have some HTML content (including formatting tags such as strong
, images etc).In my Java code, I want to convert this HTML content into a PDF document without losing the HTML formatting.
Is there anyway to do it in Java (using iText or any other library)?
I would try DocRaptor.com. It converts html to pdf or html to xls in any language, and since it uses Prince XML (without making you pay the expensive license fee), the quality is a lot better than the other options out there. It's also a web app, so there's nothing to download. Easy way to get around long, frustrating coding.
Here are some examples:
https://docraptor.com/documentation#coding_examples
I used ITextRenderer
from the Flying Saucer project.
Here is a short, self-contained, working example.
In my case I wanted to later stream the bytes into an email attachment.
So, in the example I write it to a file purely for the sake of demonstration for this question. This is Java 8.
import com.lowagie.text.DocumentException;
import org.apache.commons.io.FileUtils;
import org.xhtmlrenderer.pdf.ITextRenderer;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.IOException;
public class So4712641 {
public static void main(String... args) throws DocumentException, IOException {
FileUtils.writeByteArrayToFile(new File("So4712641.pdf"), toPdf("<b>You gotta walk and don't look back</b>"));
}
/**
* Generate a PDF document
* @param html HTML as a string
* @return bytes of PDF document
*/
private static byte[] toPdf(String html) throws DocumentException, IOException {
final ITextRenderer renderer = new ITextRenderer();
renderer.setDocumentFromString(html);
renderer.layout();
try (ByteArrayOutputStream fos = new ByteArrayOutputStream(html.length())) {
renderer.createPDF(fos);
return fos.toByteArray();
}
}
}
This gives me
For completeness, here are relevant pieces for my Maven pom.xml
<dependencies>
<dependency>
<groupId>org.xhtmlrenderer</groupId>
<artifactId>flying-saucer-pdf</artifactId>
<version>9.0.8</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.4</version>
</dependency>
</dependencies>
Converting HTML to PDF isn't exactly straightforward in general, but if you're in control of what goes into the HTML, you can try using an XSL-FO implementation, like Apache FOP.
It's not out-of-the-box as you'll have to write (or find) a stylesheet that defines the conversion rules, but on the upside it gives you much more control over output formatting, which is quite useful as what looks good on screen doesn't necessarily look good on paper.