I tried some utilities for digital PDF signing based on iText v1 or v2 and found out that it seems whole PDF is loaded into memory (for 60M PDF process can take up to 300-400MB of memory).
Can recent iText versions sign PDF without load it into memory?
Updates
I tested Bruno's example with itextpdf 5.5.6
- PdfReader constructor doesn't matter - it can be (src) or (src, null, true), or (src, null, false) - result the same.
- what matters is new File(tmp) in createSignature.
But memory consumption is still to big. I tried to sign 100M file (it's PDF with embedded attachment), peak memory is about 325M. Sure, it's better than 540M without temporary file, but not good enough (((.
With 32K file max. memory was 65M (that's JVM and java code itself, I guess)
Memory was measured with /usr/bin/time -v java ....
I limited Java memory with -Xmx100m
, but it crashed with out of memory:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.itextpdf.text.pdf.PdfReader.getStreamBytesRaw(PdfReader.java:2576) at com.itextpdf.text.pdf.PdfReader.getStreamBytesRaw(PdfReader.java:2615) at com.itextpdf.text.pdf.PRStream.toPdf(PRStream.java:230) at com.itextpdf.text.pdf.PdfIndirectObject.writeTo(PdfIndirectObject.java:158) at com.itextpdf.text.pdf.PdfWriter$PdfBody.write(PdfWriter.java:420) at com.itextpdf.text.pdf.PdfWriter$PdfBody.add(PdfWriter.java:398) at com.itextpdf.text.pdf.PdfWriter.addToBody(PdfWriter.java:887) at com.itextpdf.text.pdf.PdfStamperImp.close(PdfStamperImp.java:412) at com.itextpdf.text.pdf.PdfStamperImp.close(PdfStamperImp.java:386) at com.itextpdf.text.pdf.PdfSignatureAppearance.preClose(PdfSignatureAppearance.java:1316) at com.itextpdf.text.pdf.security.MakeSignature.signDetached(MakeSignature.java:140)
Code is:
public static byte[] getStreamBytesRaw(final PRStream stream, final RandomAccessFileOrArray file) throws IOException {
PdfReader reader = stream.getReader();
byte b[];
if (stream.getOffset() < 0)
b = stream.getBytes();
else {
----> b = new byte[stream.getLength()];
file.readFully(b);
I see in debugger that stream type is EmbeddedFile and length is 100M - so whole embedded file is being read into memory.
Update - create big PDF
It's difficult to share 100M file )), but here is create sequence:
- Run
dd if=/dev/urandom of=file.bin bs=1048000 count=100
- Go to http://blog.didierstevens.com/programs/pdf-tools/ and take http://didierstevens.com/files/software/make-pdf_V0_1_6.zip
- Unzip and run
python make-pdf-embedded.py file.bin file.pdf
Here you are )
I should note that it's important to use /dev/urandom. /dev/zero creates compressed PDF with only 100K size.
Anyway, if it's necessary to obtain my file I've created 50M file on server - http://50mpdf.tk/50m.pdf
Please download the free ebook Digital Signatures for PDF documents. Section 2.2.4 is entitled "Signing large PDF files". It explains how to sign a document using a temporary file instead of keeping the file in memory:
Do you see how we create the
PdfStamper
instance? We add aFile
object as an extra parameter to thecreateSignature()
method. Thetmp
variable in this code sample can be a path to a specific file or to a directory. In case a directory is chosen, iText will create a file with a unique name in that directory.If you use the
createSignature()
method with a temporary file, you can use anOutputStream
(theos
value) that isnull
. In that case, the temporary file will serve as the actual destination file. This is good practice if your goal is to store a signed file on your file system. If theOutputStream
is notnull
, iText will always try to delete the temporary file after the signing is done.Please do not use iText v1 or v2 anymore. The types of signatures created with those versions are outdated, and so are the iText versions (see also https://stackoverflow.com/questions/25696851/can-itext-2-1-7-or-earlier-can-be-used-commercially).
While signing a PDF, iText uses relevant amounts of memory
PdfReader
in partial mode;PdfStamper
configured to use a temporary file; andPdfStamper
in append mode.E.g. signing the sample 50 MB file supplied by the OP requires
-Xmx240m
if using neither append mode, nor a temporary file, nor partial mode;-Xmx81m
if using a temporary file but not append mode, partial mode makes no difference;-Xmx7m
if using append mode and a temporary file, partial mode makes no difference.The reason why partial mode makes no difference in the later cases, is that even in non-partial-mode the
PdfReader
does not seems to read stream contents during initialization. As the sample file consists mostly of the contents of a single big stream, the few objects read or not read during initialization don't make a difference, especially as even in partial mode thePdfReader
reads and keeps some objects in memory which reflect the global document structure, e.g. the page tree.You can find my test routines here: CreateSignature.java. I ran it on a 64bit MS Windows Java 8 using iText 5.5.7-SNAPSHOT (which should not differ from the 5.5.6 release in this context).
Thus, for memory-friendly signing use this variant of @Bruno's code: