I'm attempting to strip together various PDFs. They're not that text heavy, with the occasional image. Say for example I have two PDFs, 1.4Mb and 740kb - when I combine them they balloon to 6Mb!
I've tried scripted combination, and hand appending, with the same result, so I'm guessing it's an underlying issue. Some explanation of why it happens would be useful, so I can look at ways of avoiding it. Is it a mismatch in colour models? They fonts are minimal.
You aren't telling us how you're combining the PDFs which makes your question rather theoretical, so I am going to give you a theoretical answer:
Part 1
If you "burst" this PDF into 10 separate single-page PDFs, each PDF will consist of about 300 KByte: 100 KByte in content stream + 200 KByte in resources (I'm ignoring the overhead of having 10 separate xref tables and file trailers).
If you're using iText to combine the PDFs, then using
PdfCopy
will result in the 3000 KByte PDF, becausePdfCopy
just copies documents as fast as possible without looking at the content of the document. If you want the 1200 KByte PDF, then you need to usePdfSmartCopy
in which case you'll need more memory and CPU because iText will examine each PDF and reuse objects that would otherwise be redundant.Part 2
In your question, you mention that you have a 1.4Mb and a 740kb PDF, and that 1.4Mb + 740kb results in a PDF of 6Mb. The first part of my theoretical example doesn't explain the extreme growth in size, so here's a second part.
Suppose that your original PDFs have compressed object streams and a compressed cross-reference table. Suppose that you combine these PDFs into a PDF that is more like a PDF 1.4 document. In that case, the compressed objects and the compressed cross-reference stream will no longer be compressed, resulting in a much bigger file size.
Part 3?
There might be other reasons, depending on the nature of the original PDFs and on the tool that you're using to combine the PDFs. You should clarify if none of the above applies.