pdf file size is largely increased when copied usi

2019-05-13 22:23发布

问题:

I am trying to copy existing pdf file into some new file using itextpdf library in Java. I am using version 5.5.10 of itextpdf. I am facing different issues with both ways : PDFStamper and PdfCopy. When I use PDFStamper class, I observe that new file size is increased by large margin, although nothing new items were added. Here is code piece :

    String currFile="C:\misc\pdffiles\AcroJS.pdf" ;
    String dest = "C:\misc\pdffiles\AcroJS_copy.pdf" ;
    PdfReader reader = new PdfReader(currFile) ;
    PdfStamper stamper = new PdfStamper(reader,new FileOutputStream(dest)) ;
    stamper.close() ;
    reader.close() ;

Some observations are : 7 MB(original) to 13 MB (Approx, new file) , 116 KB > 119 KB (Approx)

I was expecting approximate same file size when just copying existing pdf file. I am not able to figure out why size is increasing that much.

I have tried PdfCopy class as well. I Followed 2 approaches with PdfCopy:

  1. Copy page by page.
  2. Call setMergeFields() on pdfcopy object then call pdfcopy.addDocument(reader) ;

But problem in both approaches is that it is throwing away some non-content metadata from pdf file and hence new pdf is breaking when opened by Adobe reader. For example my pdf contains dictionary object PdfName.S . In this case newly created pdf file is just 2KB (original was 1.6 MB) , it clearly means nothing is copied into document and it is broken.

My original requirement is very simple : copy existing pdf to new pdf file, without increase in size, without throwing away necessary items. Obiviously It is not like, copy, paste and then rename. Because in next step, I have some processings to do with pdf content. Any help will be much appreciated.

OS : Windows 10 Pro Java : 1.8.101 itext : 5.5.10

thanks

回答1:

Use of PdfStamper

Your code

Your code

PdfStamper stamper = new PdfStamper(reader,new FileOutputStream(dest)) ;
stamper.close() ;

essentially tells iText to copy the original PDF throwing away unused object and using iText's default compression settings.

iText's default compression settings include not using compressed cross reference and object streams (introduced in PDF 1.5) but the older technique of cross reference tables and individually compressed objects.

The sample file, on the other hand does use these techniques. Thus, it is much better compressed.

Code with full compression

You can tell iText to use these improved compression techniques, too, like this:

PdfReader reader = new PdfReader(resourceStream);
PdfStamper stamper = new PdfStamper(reader, outputStream);
stamper.setFullCompression();

stamper.close();

(Stamping.java test method testStampAcroJSCompressed)

This results in a file less than 4 MB in size.

Code with append mode

If you want to remain faithful to the original way objects were stored, you can instead use the append mode which identically copies the original file and adds changes in the form of a so called incremental update, like this:

PdfReader reader = new PdfReader(resourceStream);
PdfStamper stamper = new PdfStamper(reader, outputStream, '\0', true);

stamper.close();

(Stamping.java test method testStampAcroJSAppended)

This results in a file slightly larger than the original file.

Use of PdfCopy

You observed that PdfCopy

is throwing away some non-content metadata

Of course it does. PdfCopy is designed to copy pages from one PDF to another, keeping content and annotations as they were but ignoring other page-level and all document-level information.



标签: java pdf itext