I'm using iText to extract embedded images and save them as separate files. The .jpg and .png files come out ok, but I cannot extract tiff images that have the CCITTFaxDecode encoding.
Does anyone have a way of saving the tiff files?
I found some sample C# code that uses iTextSharp at
Extracting image from PDF with /CCITTFaxDecode filter
It indicates a separate tiff library is needed to write out the results. According to that article, the "CCITTFaxDecode" compression is Compression.CCITTFAX4 for the tiff library.
To use that article's method, I need:
1. get a tiff library.
The Java Image I/O API will allow you to read and write TIFF files among other formats.
BufferedImage image = ImageIO.read( new File( "image.tif" ) );
- Find out the equivalent of the code for getting the bitmap's property from the PDF, example:
pd.Get(PdfName.WIDTH).ToString() (which is in C#)
I extracted a tiff image from scanned pdf (that is the every page as image) in the following way:
...
PdfReader reader = new PdfReader("source.pdf");
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
MyImageRenderListener listener = new MyImageRenderListener("destination.jpg");
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
parser.processContent(i, listener);
}
...
code of MyImageRenderListener.class:
class MyImageRenderListener implements RenderListener {
protected String path = "";
public MyImageRenderListener(String path) {
this.path = path;
}
public void beginTextBlock() {
}
public void endTextBlock() {
}
public void renderImage(ImageRenderInfo renderInfo) {
try {
String filename;
FileOutputStream os;
PdfImageObject image = renderInfo.getImage();
PdfName filter = (PdfName) image.get(PdfName.FILTER);
if (PdfName.CCITTFAXDECODE.equals(filter)) {
BufferedImage bufferedImage = image.getBufferedImage();
ImageIO.write(bufferedImage, "jpg", new FileOutputStream(filename));// save tif image as jpg
}
} catch (IOException e) {
e.printStackTrace();
}
}
public void renderText(TextRenderInfo renderInfo) {
}
}