Handling embedded images in IText XMLWorker

2019-07-18 11:23发布

问题:

Handling embedded images in IText XMLWorker.

Is there a way to handle Embedded (Base64) images in XMLWorker? In version 5.3.5 the ImageProvider I used does not work any more (an exception is raised before), so I Patched ImageRetrieve as follows, but obviously this will be broken in next XMLWorker update:

package com.itextpdf.tool.xml.net;

import java.io.File;
import java.io.IOException;
import java.net.MalformedURLException;

import com.itextpdf.text.BadElementException;
import com.itextpdf.text.Image;
import com.itextpdf.text.pdf.codec.Base64;
import com.itextpdf.tool.xml.net.exc.NoImageException;
import com.itextpdf.tool.xml.pipeline.html.ImageProvider;
import java.util.regex.Matcher;
import java.util.regex.Pattern;


/**
 * @author redlab_b
 *
 */
public class ImageRetrieve {
    final static Pattern INLINE_PATTERN = Pattern.compile("^/data:image/(png|jpg|gif);base64,(.*)");

    private final ImageProvider provider;
    /**
     * @param imageProvider the provider to use.
     *
     */
    public ImageRetrieve(final ImageProvider imageProvider) {
        this.provider = imageProvider;
    }
    /**
     *
     */
    public ImageRetrieve() {
        this.provider = null;
    }
    /**
     * @param src an URI that can be used to retrieve an image
     * @return an iText Image object
     * @throws NoImageException if there is no image
     * @throws IOException if an IOException occurred
     */
    public com.itextpdf.text.Image retrieveImage(final String src) throws NoImageException, IOException {
        com.itextpdf.text.Image img = null;
        if (null != provider) {
            img = provider.retrieve(src);
        }

        if (null == img) {
            String path = null;
            if (src.startsWith("http")) {
                // full url available
                path = src;
            } else if (null != provider){
                String root = this.provider.getImageRootPath();
                if (null != root) {
                    if (root.endsWith("/") && src.startsWith("/")) {
                        root = root.substring(0, root.length() - 1);
                    }
                    path = root + src;
                }
            } else {
                path = src;
            }
            if (null != path) {
                try {
                  Matcher m;
                    if (path.startsWith("http")) {
                        img = com.itextpdf.text.Image.getInstance(path);
                    } else if ((m = INLINE_PATTERN.matcher(path)).matches()) {
                      // Let's handle the embedded image without saving it
                      try {
                        byte[] data = Base64.decode(m.group(2));
                        return Image.getInstance(data);
                      } catch (Exception ex) {
                        throw new NoImageException(src, ex);
                      }
                    } else {
                        img = com.itextpdf.text.Image.getInstance(new File(path).toURI().toURL());
                    }
                    if (null != provider && null != img) {
                        provider.store( src, img);
                    }
                } catch (BadElementException e) {
                    throw new NoImageException(src, e);
                } catch (MalformedURLException e) {
                    throw new NoImageException(src, e);
                }
            } else {
                throw new NoImageException(src);
            }
        }
        return img;
    }



}

回答1:

It's almost year since you asked this question, but maybe this answer will help anyway.

Recently I met similar problem. My goal was to include in generated pdf an image stored in database.

To do this I've extended the com.itextpdf.tool.xml.pipeline.html.AbstractImageProvider class and overriden its retrieve() method like this:

public class MyImageProvider extends AbstractImageProvider {

  @Override
  public Image retrieve(final String src) {
    Image img = super.retrieve(src);
    if (img == null) {
      try {
        byte [] data = getMyImageSomehow(src);
        img = Image.getInstance(data);
        super.store(src, img);
      }
      catch (Exception e) {
        //handle exceptions
      }
    }
    return img;
  }

  @Override
  public String getImageRootPath() {
    return "http://sampleurl/img";
  }
}

Then, when building pipelines for XMLWorker [1], I pass an instance of my class to context:

htmlPipelineContext.setImageProvider(new MyImageProvider());

Now we would expect that this should work. But there's a catch! Somewhere, deep inside the xmlworker library, this htmlPipelineContext is being cloned. And during this operation our implementation of ImageProvider get's lost. This is happening inside HtmlPipelineContext's clone() method. Take a look at lines 274-280 (I refer to 5.4.4 version):

final String rootPath =  imageProvider.getImageRootPath();
newCtx.setImageProvider(new AbstractImageProvider() {
  public String getImageRootPath() {
    return rootPath;
  }
});

This is even described in HtmlPipelineContext.clone()'s javadoc [2]:

Create a clone of this HtmlPipelineContext, the clone only contains the initial values, not the internal values. Beware, the state of the current Context is not copied to the clone. Only the configurational important stuff like the (...) ImageProvider (new AbstractImageProvider with same ImageRootPath) , (...) are copied.

Isn't it funny? You get class that is designed for extending by making it abstract, but at the end it turns out, that this class serves only as a property holder.

My workaround for this:

public class MySpecialImageProviderAwareHtmlPipelineContext extends HtmlPipelineContext {

  MySpecialImageProviderAwareHtmlPipelineContext () {
    super(null);
  }

  public HtmlPipelineContext clone () {
    HtmlPipelineContext ctx = null;
    try {
      ctx = super.clone();
      ctx.setImageProvider(new MyImageProvider());
    } catch (Exception e) {
      //handle exception
    }
    return ctx;
  }

}

Then I just use this instead of HtmlPipelineContext.


[1] http://demo.itextsupport.com/xmlworker/itextdoc/flatsite.html#itextdoc-menu-7

[2] http://api.itextpdf.com/xml/com/itextpdf/tool/xml/pipeline/html/HtmlPipelineContext.html#clone()



回答2:

And hopefully, your solution seems to have been adopted in later version (5.5.6 at least).