Signing PDF - memory consumption

2020-03-25 15:00发布

问题:

I tried some utilities for digital PDF signing based on iText v1 or v2 and found out that it seems whole PDF is loaded into memory (for 60M PDF process can take up to 300-400MB of memory).

Can recent iText versions sign PDF without load it into memory?

Updates

I tested Bruno's example with itextpdf 5.5.6

  • PdfReader constructor doesn't matter - it can be (src) or (src, null, true), or (src, null, false) - result the same.
  • what matters is new File(tmp) in createSignature.

But memory consumption is still to big. I tried to sign 100M file (it's PDF with embedded attachment), peak memory is about 325M. Sure, it's better than 540M without temporary file, but not good enough (((.

With 32K file max. memory was 65M (that's JVM and java code itself, I guess)

Memory was measured with /usr/bin/time -v java ....

I limited Java memory with -Xmx100m, but it crashed with out of memory:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

at com.itextpdf.text.pdf.PdfReader.getStreamBytesRaw(PdfReader.java:2576) at com.itextpdf.text.pdf.PdfReader.getStreamBytesRaw(PdfReader.java:2615) at com.itextpdf.text.pdf.PRStream.toPdf(PRStream.java:230) at com.itextpdf.text.pdf.PdfIndirectObject.writeTo(PdfIndirectObject.java:158) at com.itextpdf.text.pdf.PdfWriter$PdfBody.write(PdfWriter.java:420) at com.itextpdf.text.pdf.PdfWriter$PdfBody.add(PdfWriter.java:398) at com.itextpdf.text.pdf.PdfWriter.addToBody(PdfWriter.java:887) at com.itextpdf.text.pdf.PdfStamperImp.close(PdfStamperImp.java:412) at com.itextpdf.text.pdf.PdfStamperImp.close(PdfStamperImp.java:386) at com.itextpdf.text.pdf.PdfSignatureAppearance.preClose(PdfSignatureAppearance.java:1316) at com.itextpdf.text.pdf.security.MakeSignature.signDetached(MakeSignature.java:140)

Code is:

public static byte[] getStreamBytesRaw(final PRStream stream, final RandomAccessFileOrArray file) throws IOException {
        PdfReader reader = stream.getReader();
        byte b[];
        if (stream.getOffset() < 0)
            b = stream.getBytes();
        else {
      ----> b = new byte[stream.getLength()];
            file.readFully(b);

I see in debugger that stream type is EmbeddedFile and length is 100M - so whole embedded file is being read into memory.

Update - create big PDF

It's difficult to share 100M file )), but here is create sequence:

  1. Run dd if=/dev/urandom of=file.bin bs=1048000 count=100
  2. Go to http://blog.didierstevens.com/programs/pdf-tools/ and take http://didierstevens.com/files/software/make-pdf_V0_1_6.zip
  3. Unzip and run python make-pdf-embedded.py file.bin file.pdf

Here you are )

I should note that it's important to use /dev/urandom. /dev/zero creates compressed PDF with only 100K size.

Anyway, if it's necessary to obtain my file I've created 50M file on server - http://50mpdf.tk/50m.pdf

回答1:

While signing a PDF, iText uses relevant amounts of memory

  • reading the whole unsigned PDF into memory unless using a PdfReader in partial mode;
  • creating the signed file in memory unless using a PdfStamper configured to use a temporary file; and
  • reading whole individual PDF objects (e.g. streams containing embedded files) into memory when copying the unsigned data to the to-be-signed file unless using a PdfStamper in append mode.

E.g. signing the sample 50 MB file supplied by the OP requires

  • about -Xmx240m if using neither append mode, nor a temporary file, nor partial mode;
  • about -Xmx81m if using a temporary file but not append mode, partial mode makes no difference;
  • about -Xmx7m if using append mode and a temporary file, partial mode makes no difference.

The reason why partial mode makes no difference in the later cases, is that even in non-partial-mode the PdfReader does not seems to read stream contents during initialization. As the sample file consists mostly of the contents of a single big stream, the few objects read or not read during initialization don't make a difference, especially as even in partial mode the PdfReader reads and keeps some objects in memory which reflect the global document structure, e.g. the page tree.

You can find my test routines here: CreateSignature.java. I ran it on a 64bit MS Windows Java 8 using iText 5.5.7-SNAPSHOT (which should not differ from the 5.5.6 release in this context).

Thus, for memory-friendly signing use this variant of @Bruno's code:

// Creating the reader and the stamper
PdfReader reader = new PdfReader(filepath, null, true);
FileOutputStream os = new FileOutputStream(dest);
PdfStamper stamper =
    PdfStamper.createSignature(reader, os, '\0', new File(tmp), true);
// Creating the appearance
PdfSignatureAppearance appearance = stamper.getSignatureAppearance();
appearance.setReason(reason);
appearance.setLocation(location);
appearance.setVisibleSignature(new Rectangle(36, 748, 144, 780), 1, "sig");
// Creating the signature
ExternalSignature pks = new PrivateKeySignature(pk, digestAlgorithm, provider);
ExternalDigest digest = new BouncyCastleDigest();
MakeSignature.signDetached(appearance, digest, pks, chain,
    null, null, null, 0, subfilter);


回答2:

Please download the free ebook Digital Signatures for PDF documents. Section 2.2.4 is entitled "Signing large PDF files". It explains how to sign a document using a temporary file instead of keeping the file in memory:

// Creating the reader and the stamper
PdfReader reader = new PdfReader(filepath, null, true);
FileOutputStream os = new FileOutputStream(dest);
PdfStamper stamper =
    PdfStamper.createSignature(reader, os, '\0', new File(tmp));
// Creating the appearance
PdfSignatureAppearance appearance = stamper.getSignatureAppearance();
appearance.setReason(reason);
appearance.setLocation(location);
appearance.setVisibleSignature(new Rectangle(36, 748, 144, 780), 1, "sig");
// Creating the signature
ExternalSignature pks = new PrivateKeySignature(pk, digestAlgorithm, provider);
ExternalDigest digest = new BouncyCastleDigest();
MakeSignature.signDetached(appearance, digest, pks, chain,
    null, null, null, 0, subfilter);

Do you see how we create the PdfStamper instance? We add a File object as an extra parameter to the createSignature() method. The tmp variable in this code sample can be a path to a specific file or to a directory. In case a directory is chosen, iText will create a file with a unique name in that directory.

If you use the createSignature() method with a temporary file, you can use an OutputStream (the os value) that is null. In that case, the temporary file will serve as the actual destination file. This is good practice if your goal is to store a signed file on your file system. If the OutputStream is not null, iText will always try to delete the temporary file after the signing is done.

Please do not use iText v1 or v2 anymore. The types of signatures created with those versions are outdated, and so are the iText versions (see also https://stackoverflow.com/questions/25696851/can-itext-2-1-7-or-earlier-can-be-used-commercially).