itext get content size

2019-06-26 02:17发布

问题:

I just spent a few hours scouring the web. It seems others also have this issue, but I couldn't find an answer.

I have a whole bunch of PDF files that I need to get their measurements, namely their height and witdh of the pages content.

In Adobe Illustrator, when you import a PDF you have the option of triming to the "bounding box". That's exactly what I need.

I tried many approaches, here's the hodgepodge:

Dim pdfStream = IO.File.OpenRead(FilePath)
Dim img = PdfImages(pdfStream)
Dim pdfReader = New PdfReader(pdfStream)
Dim pdfDictionary = pdfReader.GetPageN(1)
Dim mediaBox = pdfDictionary.GetAsArray(PdfName.MEDIABOX)
Dim b = pdfReader.GetPageSize(pdfDictionary)
Dim ms = New MemoryStream
Dim document = New Document(pdfReader.GetPageSizeWithRotation(1))
Dim writer = PdfWriter.GetInstance(document, ms)
document.Open()
document.SetPageSize(pdfReader.GetPageSize(1))
document.NewPage()
Dim cb = writer.DirectContent
cb.Clip()
Dim pageImport = writer.GetImportedPage(pdfReader, 1)
pdfReader.Close()
pdfStream.Close()

All I manage to get is the page size, which is useless. I tried this on a whole bunch of PDFs, so it's not like one corrupt file or something.

回答1:

To achieve your goal,

triming to the "bounding box". That's exactly what I need

you actually have to solve two problems:

  1. You have to change the crop boxes of the individual pages of some PDF document.
  2. You have to determine the bounding box of some page, i.e. (as I assume) the smallest box (with horizontal and vertical sides) containing all visible content of a page.

Ad 1) change the crop boxes of the individual pages

You should not use the code you found for that task. Manipulating a single document almost always is best done using a PdfStamper, not a PdfWriter.

The iText in Action — 2nd Edition sample CropPages.java / CropPages.cs shows how to do that. The central method:

public byte[] ManipulatePdf(byte[] src)
{
  PdfReader reader = new PdfReader(src);
  int n = reader.NumberOfPages;
  PdfDictionary pageDict;
  PdfRectangle rect = new PdfRectangle(55, 76, 560, 816);
  for (int i = 1; i <= n; i++)
  {
    pageDict = reader.GetPageN(i);
    pageDict.Put(PdfName.CROPBOX, rect);
  }
  using (MemoryStream ms = new MemoryStream())
  {
    using (PdfStamper stamper = new PdfStamper(reader, ms))
    {
    }
    return ms.ToArray();
  }
}

(The code works in memory, i.e. expects a byte[] and returns one, but can easily be revised to work in the file system.)

As you see, you actually manipulate the PDF as present in the PdfReader and then only use the PdfStamper to store the changed Pdf.

In your case, though, there is no fixed rectangle for all pages but instead you have to determine the rectangle for each page...

Ad 2) determine the bounding box of some page

To determine the bounding box you actually have to parse the whole page content and determine the dimensions of each drawn element.

Unfortunately iText(Sharp) supports this in a comfortable manner only up to a certain degree: It provides a content parsing framework, but this framework does not yet handle vector graphics out of the box.

The iText in Action — 2nd Edition sample ShowTextMargins.java / ShowTextMargins.cs shows how you can use that framework to determine the cropbox (vector graphics ignored). The essential code:

PdfReaderContentParser parser = new PdfReaderContentParser(reader);
[...]
TextMarginFinder finder = parser.ProcessContent(i, new TextMarginFinder());

The finder via finder.GetLlx(), finder.GetLly(), finder.GetUrx(), and finder.GetUry() after that ProcessContent execution provides the coordinates of the lower left and upper right corners of the bounding box of page i (vector graphics ignored). You can use these data to construct a rectangle with which to feed pageDict.Put(PdfName.CROPBOX, rect) in the code above.

If you need to also take vector graphics into account, though, you'll have to extend the parser namespace classes somewhat to also create parsing events for vector graphics operators, and the TextMarginFinder to also take those events into account. For more on this read this answer.



回答2:

Necromancing:
mkl's code put into practice (just put some small white text into the top-left and the lower-right corner of your vector graphics):

public static void StartManipulation()
{
    byte[] ba = System.IO.File.ReadAllBytes(@"D:\username\Documents\Downloads\itextsharp-master\itextsharp-master\src\CropTest\Files\dwg305.pdf");
    // FindBoundingBox(ba);
    ba = ManipulatePdf(ba);
    System.IO.File.WriteAllBytes(@"D:\username\Downloads\mysizedpdf.pdf", ba);
} // End Sub StartManipulation



public static byte[] ManipulatePdf(byte[] src)
{
    byte[] byteBuffer = null;

    using (iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(src))
    {
        iTextSharp.text.pdf.parser.PdfReaderContentParser parser = new iTextSharp.text.pdf.parser.PdfReaderContentParser(reader);
        int n = reader.NumberOfPages;
        iTextSharp.text.pdf.PdfDictionary pageDict;

        for (int pageNumber = 1; pageNumber <= n; pageNumber++)
        {
            pageDict = reader.GetPageN(pageNumber);

            iTextSharp.text.pdf.parser.TextMarginFinder finder = parser.ProcessContent(pageNumber, new iTextSharp.text.pdf.parser.TextMarginFinder());

            // iTextSharp.text.Rectangle pageSize = reader.GetPageSize(pageNumber);

            // Get Content Size
            float Llx = finder.GetLlx();
            float Lly = finder.GetLly();
            float Urx = finder.GetUrx();
            float Ury = finder.GetUry();
            //iTextSharp.text.pdf.PdfRectangle rect = new iTextSharp.text.pdf.PdfRectangle(55, 76, 560, 816);
            //iTextSharp.text.pdf.PdfRectangle rectTextContentSize = new iTextSharp.text.pdf.PdfRectangle(Llx, Lly, Urx, Ury);

            int SafetyMargin = 100;
            iTextSharp.text.pdf.PdfRectangle rectTextContentSize = new iTextSharp.text.pdf.PdfRectangle(Llx - SafetyMargin, Lly - SafetyMargin, Urx + SafetyMargin, Ury + SafetyMargin);

            pageDict.Put(iTextSharp.text.pdf.PdfName.CROPBOX, rectTextContentSize);
        } // Next i 

        using (System.IO.MemoryStream ms = new System.IO.MemoryStream())
        {
            using (iTextSharp.text.pdf.PdfStamper stamper = new iTextSharp.text.pdf.PdfStamper(reader, ms))
            { }

            byteBuffer = ms.ToArray();
        } // End Using ms

    } // End Using reader 

    return byteBuffer;
} // End Function ManipulatePdf 


public static System.Drawing.Size FindBoundingBox(byte[] src)
{
    System.Drawing.Size sze = default(System.Drawing.Size);
    // iTextSharp.text.pdf
    // iTextSharp.text.pdf.parser

    using (iTextSharp.text.pdf.PdfReader reader = new iTextSharp.text.pdf.PdfReader(src))
    {
        iTextSharp.text.pdf.parser.PdfReaderContentParser parser = new iTextSharp.text.pdf.parser.PdfReaderContentParser(reader);

        for (int pageNumber = 1; pageNumber <= reader.NumberOfPages; pageNumber++)
        {
            iTextSharp.text.pdf.parser.TextMarginFinder finder = parser.ProcessContent(pageNumber, new iTextSharp.text.pdf.parser.TextMarginFinder());

            iTextSharp.text.Rectangle pageSize = reader.GetPageSize(pageNumber);
            float Llx = finder.GetLlx();
            float Lly = finder.GetLly();
            float Urx = finder.GetUrx();
            float Ury = finder.GetUry();

            float PdfSharpLly = pageSize.Height - Lly;
            float PdfSharpUry = pageSize.Height - Ury;


            sze = new System.Drawing.Size((int)(Urx - Llx), (int)(Ury - Lly));


            System.Console.WriteLine("Width: {0}<r\nHeight: {1}", pageSize.Width, pageSize.Height);
            System.Console.WriteLine("Llx: {0}\r\nLly: {1}\r\nUrx: {2}\r\nUry: {3}\r\n", Llx, Lly, Urx, Ury);
        } // Next pageNumber 

    } // End Using reader 

    return sze;
} // End Function FindBoundingBox 


回答3:

I have a code that maybe I can help you

import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.file.Files;
import java.util.HashMap;
import java.util.Map;

import org.apache.commons.io.FileUtils;

import com.itextpdf.text.BaseColor;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Element;
import com.itextpdf.text.Font;
import com.itextpdf.text.Font.FontFamily;
import com.itextpdf.text.FontFactory;
import com.itextpdf.text.Image;
import com.itextpdf.text.Phrase;
import com.itextpdf.text.Rectangle;
import com.itextpdf.text.pdf.BarcodeQRCode;
import com.itextpdf.text.pdf.ColumnText;
import com.itextpdf.text.pdf.PdfArray;
import com.itextpdf.text.pdf.PdfContentByte;
import com.itextpdf.text.pdf.PdfDictionary;
import com.itextpdf.text.pdf.PdfDocument;
import com.itextpdf.text.pdf.PdfGState;
import com.itextpdf.text.pdf.PdfName;
import com.itextpdf.text.pdf.PdfNumber;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfRectangle;
import com.itextpdf.text.pdf.PdfStamper;
import com.itextpdf.text.pdf.parser.PdfReaderContentParser;
import com.itextpdf.text.pdf.parser.TextMarginFinder;
import com.itextpdf.text.pdf.qrcode.EncodeHintType;

public static void sign(String src){
        try {
            String line1 = "Sign By: (VINICIUS)";
            String line2 = "Security Seal Number: 123545678";


            ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
            byte[] array = Files.readAllBytes(new File(src).toPath());
            int size = 36;
            String docUrl = "https://website.com";
            Map<EncodeHintType, Object> hints = new HashMap<EncodeHintType, Object>();
            BarcodeQRCode qrCode = new BarcodeQRCode(docUrl, size, size, hints);
            PdfReader reader = new PdfReader(array);
            PdfStamper stamper = new PdfStamper(reader, outputStream);
            PdfGState gs1 = new PdfGState();
            gs1.setFillOpacity(0.5f);
            int pageCount = reader.getNumberOfPages();

            Float y1 = 30f;
            Float y2 = 20f;
            Float y3 = 10f;
            PdfArray cropbox;
            PdfDictionary pageDict = null;
            float resultX = 30 + size;
            float imgX = 15f;
            for (int i = 1; i <= pageCount; i++) {
                PdfContentByte contentByte = stamper.getOverContent(i);
                Rectangle pgSize = reader.getPageSizeWithRotation(i);
                if(pgSize.getHeight() > 842){
                    y1 = (float) (pgSize.getHeight() - 812);
                    y2 = (float) (pgSize.getHeight() - 822);
                    y3 = (float) (pgSize.getHeight() - 832);
                }
                pageDict = reader.getPageN(i);
                cropbox = pageDict.getAsArray(PdfName.CROPBOX);
                if(cropbox != null){
                    float wDoc     = pgSize.getWidth();
                    float hDoc     = pgSize.getHeight();
                    PdfNumber wCropboxNumber = cropbox.getAsNumber(2);
                    PdfNumber hCropboxNumber = cropbox.getAsNumber(3);
                    float wCropbox = wCropboxNumber.floatValue();
                    float hCropbox = hCropboxNumber.floatValue();
                    resultX = (wDoc - wCropbox)+30+size;
                    y1   = (hDoc - hCropbox) + 30;
                    y2   = (hDoc - hCropbox) + 20;
                    y3   = (hDoc - hCropbox) + 10;
                    imgX = (wDoc - wCropbox) + 15; 
                }


                contentByte.beginText();
                contentByte.setFontAndSize(FontFactory.getFont(FontFactory.HELVETICA).getBaseFont(), 7);
                contentByte.setColorFill(BaseColor.DARK_GRAY);
                contentByte.showTextAligned(Element.ALIGN_LEFT, line1, resultX, y1 , 0); // 30
                contentByte.showTextAligned(Element.ALIGN_LEFT, line2, resultX, y2 , 0); // 20

                //contentByte.showTextAligned(Element.ALIGN_LEFT, line1, resultX, y1 , 0); // 30 


                contentByte.endText();

                Image image = qrCode.getImage();
                image.setScaleToFitHeight(true);
                image.setAbsolutePosition(imgX , y3); // 10
                image.setBorder(Image.NO_BORDER);
                image.setSpacingAfter(0);
                image.setSpacingBefore(0);
                contentByte.addImage(image);
            }

            stamper.close();

            File assinado = new File("sign.pdf");
            if(assinado.exists()){
                assinado.delete();
            }

            FileUtils.writeByteArrayToFile(new File("sign.pdf"), outputStream.toByteArray());

        } catch (Exception e) {
            e.printStackTrace();
        }

    }