HTML to PDF with base64 images throws FileNotFound

2019-01-09 18:14发布

问题:

I'm using itextpdf-5.0.6.jar (Java 8) and when I try to export html code with base64 image tag I get file not found exception.

if I remove the image tag everything works great!

I found few solutions about overriding image tag processor but most of them are old and not compatiable with the 5.0.6 version.

Here is the HTML I send:

    "<!doctype html>\n<html lang=\"en\">\n<head>\n    
<meta charset=\"UTF-8\">\n    
<title>Test PDF</title>\n</head>\n<body>\n\n
<div class=\"pdf-header\">\n\n 
  <img src=\"\">     \n\n\n</div>\n\n<div class=\"main\">\n<div class=\"canvas\">\nHellow world</div></div></body>\n</html>"

part of my code:

fileOutputStream = new FileOutputStream(file);
Document document = new Document();
PdfWriter.getInstance(document, fileOutputStream);
document.open();
HTMLWorker htmlWorker = new HTMLWorker(document);
StringReader stringReader = new StringReader(htmlCode);
htmlWorker.parse(stringReader);
document.close();
fileOutputStream.close();

any help will be appricated thanks

回答1:

Please stop using HTMLWorker, as repeated many times on StackOverflow, the HTMLWorker class has been abandoned in favor of XML Worker a long time ago. We won't invest in further development of HTMLWorker so it's a very bad choice to use it. Please switch to XML Worker.

Also upgrade to the latest iText version, the version you are using dates from February 4, 2011, many bugs have been fixed in the 4 years that have passed. Make sure you have both the iText jar and the XML Worker jar with the same version number.

Base64 images aren't supported yet, but I have made you a very simple Proof of Concept, showing how easy it is to add support for such images. Take a look at the ParseHtml4 example and the resulting PDF: html_4.pdf.

To achieve this, you need to write an implementation of the ImageProvider interface. I have done this by extending the AbstractImageProvider class:

class Base64ImageProvider extends AbstractImageProvider {

    @Override
    public Image retrieve(String src) {
        int pos = src.indexOf("base64,");
        try {
            if (src.startsWith("data") && pos > 0) {
                byte[] img = Base64.decode(src.substring(pos + 7));
                return Image.getInstance(img);
            }
            else {
                return Image.getInstance(src);
            }
        } catch (BadElementException ex) {
            return null;
        } catch (IOException ex) {
            return null;
        }
    }

    @Override
    public String getImageRootPath() {
        return null;
    }
}

As you can see, I check for the existence of "base64," in whatever is passed to XML Worker through the src attribute of the img tag. If that String is present, I decode whatever follows that "base64," and I return an Image object that is created using the resulting bytes.

Once you have this ImageProvider implementation, it's only a matter of passing it to XML Worker.