iTextSharp Image Extraction with Transparency

2020-02-15 00:40发布

问题:

I am using iTextSharp and trying to extract images with transparency from a PDF. When I extract the image the transparency becomes solid black and is lost. I have found multiple examples of image extraction but all of them seem to have the same issue. The code that I am using is below

Another example is from itextpdf.com/examples/iia.php?id=284. This example includes images in the "results" section at the top. If you click Img7.png you will see the black border in the image, however at the bottom of the page there is a link to the original image info.png that shows the transparency the way it is supposed to look. This is the exact issue I am running into. Any help or ideas would be appreciated

public void ExtractImage(string pdfFile)
        {
            const int pageNumber = 1; //Page number to extract the image from
            PdfReader pdf = new PdfReader(pdfFile);
            PdfDictionary pg = pdf.GetPageN(pageNumber);
            PdfDictionary res = (PdfDictionary)PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES));
            PdfDictionary xobj = (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
            foreach (PdfName name in xobj.Keys)
            {
                PdfObject obj = xobj.Get(name);
                if (obj.IsIndirect())
                {
                    PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
                    string width = tg.Get(PdfName.WIDTH).ToString();
                    string height = tg.Get(PdfName.HEIGHT).ToString();
                    ImageRenderInfo imgRI =
                            ImageRenderInfo.CreateForXObject(new Matrix(float.Parse(width), float.Parse(height)),
                                                             (PRIndirectReference)obj, tg);

                    var fileType= imgRI.GetImage().GetFileType();
                    RenderImage(imgRI, imgPath + +imgRI.GetRef().Number + "_" + imgRI.GetRef().Generation + "test." + fileType);
                }
            }
            pdf.Close();
        }

        private void RenderImage(ImageRenderInfo renderInfo, string saveImageLocation)
        {
            PdfImageObject image = renderInfo.GetImage();

            using (var dotnetImg = image.GetDrawingImage())
            {
                if (dotnetImg != null)
                {
                    dotnetImg.Save(saveImageLocation);
                }
            }
        }

回答1:

Please read the PDF specification (ISO-32000-1). You are making the assumption that, for instance a transparent PNG, can be stored inside a PDF as a transparent PNG. That assumption is wrong.

The image type PNG isn't supported in PDF. When a transparent PNG is added to a PDF document, it is converted into two compressed bitmaps. One bitmap is the image you're referring to: the image that allegedly lost its transparency. The other bitmap, an image you didn't tell us anything about, but that is there, is a mask for this image. When you examine the Image XObject, you'll notice that it has a reference to this mask. This is explained in my book in section 10.3.2, entitled "Masking images".

Your allegation that you have a transparent image stored in your PDF documents is wrong. Instead, you have two opaque images of which one image is the mask of the other, in order to achieve transparency. You can't extract these images as a single transparent image. You need to extract both opaque images and merge them into a single transparent image. This is outside the scope of iText(Sharp).