PDF scaling/importing pages that have been trimmed

2019-08-10 19:12发布

问题:

I am currently in the process of writing an app that "formats" PDF's to our requirements, using iText 2.1.7.

We basically take a portrait PDF, and scale down the pages, so we can fit 2 pages of the original PDF, on one landscape page of the new PDF. We also leave some space at the bottom of the page which is used for post processing.

This process works 90% of the time as it should.

However, we received a PDF that has been cropped/trimmed by the content department, and when we view this PDF in Acrobat, it looks as excpected. However, when we process it, the new PDF includes the entire original MediaBox, and the crop lines.

Here is the code we use, and how a problem output looks.

File tempFile = new File(tempFilename);
PdfReader reader = new PdfReader(originalPdfFile);
Document doc = new Document(new RectangleReadOnly(842f, 595f), 0, 0, 0, 0);
PdfWriter writer = PdfWriter.getInstance(doc, new FileOutputStream(tempFile));
doc.open();
for (int i = 1; i < reader.getNumberOfPages(); i = i + 2) {
    doc.newPage();
    PdfContentByte cb = writer.getDirectContent();
    PdfImportedPage page = writer.getImportedPage(reader, i); // page #1

    float documentWidth = doc.getPageSize().getWidth() / 2;
    float documentHeight = doc.getPageSize().getHeight() - 65f;

    float pageWidth = page.getWidth();
    float pageHeight = page.getHeight();

    float widthScale = documentWidth / pageWidth;
    float heightScale = documentHeight / pageHeight;
    float scale = Math.min(widthScale, heightScale);

    float offsetX = (documentWidth - (pageWidth * scale)) / 2;
    float offsetY = 65f; //100f

    cb.addTemplate(page, scale, 0, 0, scale, offsetX, offsetY);

    PdfImportedPage page2 = writer.getImportedPage(reader, i+1); // page #2

    pageWidth = page.getWidth();
    pageHeight = page.getHeight();

    widthScale = documentWidth / pageWidth;
    heightScale = documentHeight / pageHeight;
    scale = Math.min(widthScale, heightScale);

    offsetX = ((documentWidth - (pageWidth * scale)) / 2) + documentWidth;
    offsetY = 65f; //100f

    cb.addTemplate(page2, scale, 0, 0, scale, offsetX, offsetY);//430f
    }

doc.close();

original in acrobat:

modified in acrobat, showing unwanted pretrim content:

回答1:

Although it's hard to be sure without seeing the PDF itself, I suspect your problem is that this PDF specifies a CropBox on at least some of its pages. If that's the case, then I think that you would want to do something like page.setBoundingBox(reader.getCropBox(i)); right after you get the page reference.

Note that the default value of a page's CropBox is it's MediaBox, so the addition of the above line shouldn't negatively impact the layout of PDF pages that do not specify a CropBox.

(I am not an iText user, so this is a bit of speculation on my part...)

Good luck!



回答2:

after a lot of frustration, I finally got this to work, by "hard cropping" the PDF before doing my scaling and layout processing.

The hard cropping takes an Acrobat cropped PDF (cropped = hidden), and uses PdfStamper to create a new PDF only containing the contents from inside the crop box.

public String cropPdf(String pdfFilePath) throws DocumentException, IOException {
    String filename = FilenameUtils.getBaseName(pdfFilePath) + "_cropped." + FilenameUtils.getExtension(pdfFilePath);
    filename = FilenameUtils.concat(System.getProperty("java.io.tmpdir"), filename);
    PdfReader reader = new PdfReader(pdfFilePath);
    try {
        PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(filename));
        try {
            for (int i = 1; i <= reader.getNumberOfPages(); i++) {
                PdfDictionary pdfDictionary = reader.getPageN(i);
                PdfArray cropArray = new PdfArray();
                Rectangle cropbox = reader.getCropBox(i);                   
                cropArray.add(new PdfNumber(cropbox.getLeft()));
                cropArray.add(new PdfNumber(cropbox.getBottom()));
                cropArray.add(new PdfNumber(cropbox.getLeft() + cropbox.getWidth()));
                cropArray.add(new PdfNumber(cropbox.getBottom() + cropbox.getHeight()));
                pdfDictionary.put(PdfName.CROPBOX, cropArray);
                pdfDictionary.put(PdfName.MEDIABOX, cropArray);
                pdfDictionary.put(PdfName.TRIMBOX, cropArray);
                pdfDictionary.put(PdfName.BLEEDBOX, cropArray);
            }
            return filename;
        } finally {
            stamper.close();
        }
    } finally {
        reader.close();
    }
}


回答3:

One minor but important fix to Kabals answer: the boxes expect width/height instead of coordinates:

            ...
            cropArray.add(new PdfNumber(cropbox.getLeft()));
            cropArray.add(new PdfNumber(cropbox.getBottom()));
            cropArray.add(new PdfNumber(cropbox.getWidth()));
            cropArray.add(new PdfNumber(cropbox.getHeight()));
            ...