copy pdf form with PdfCopy not working in itextsha

2019-01-15 16:01发布

问题:

In the release notes of iText 5.4.4 it says:

From now on you can now merge forms and preserve the tagged PDF structure when using the addDocument() method in PdfCopy. At the same time, we've deprecated PdfCopyFields.*

I try to merge multiple pdf documents into one pdf document. If one of these documents is a pdf form with acroFields, those fields will be invisible in the output document. This is the case when I use the addDocument() method in PdfCopy. When I use the addDocument() method in PdfCopyFields it works fine. PdfCopyFields is deprecated in iTextSharp, but is PdfCopy working correctly? There is another reason not to use PdfCopyFields (from "iText in Action":

Don’t use PdfCopyFields to concatenate PDF documents without form fields. As opposed to concatenating documents using PdfCopy, Pdf- CopyFields needs to keep all the documents in memory to update the combined form. This can become problematic if you’re trying to concatenate large documents.

This is the code I use:

public static void MergePdfs4()
{
    var f1 = @"C:\Users\paulusj\Downloads\OoPdfFormExampleFilled.pdf";
    var f2 = @"c:\GEODAN\work\EV_Original.pdf";

    using (
        Stream outputPdfStream = new FileStream("combined4.pdf ", FileMode.Create, FileAccess.Write,
            FileShare.None))
    {
        var document = new Document();
        var copy = new PdfCopy(document, outputPdfStream);
        document.Open();
        copy.AddDocument(new PdfReader(f1));
        copy.AddDocument(new PdfReader(f2));
        copy.Close();
    }
}

The strange thing is that when I copy EV_Original.pdf using Adobe Reader "Save As", the copy is merged (almost) correctly. So in the output pdf I can see the form fields.
When I use this code:

public static void MergePdfs3()
{
    var f1 = @"C:\Users\paulusj\Downloads\OoPdfFormExampleFilled.pdf";
    var f2 = @"c:\GEODAN\work\EV_Original.pdf";

    using (Stream outputPdfStream = new FileStream("combined3.pdf ", FileMode.Create, FileAccess.Write,
            FileShare.None))
    {

        var copy = new PdfCopyFields(outputPdfStream);
        copy.AddDocument(new PdfReader(f1));
        copy.AddDocument(new PdfReader(f2));
        copy.Close();
    }
}

It works fine. But in this code PdfCopyFields is used.

The pdfs used can be found here:
Example.pdf
EV_Original.pdf

Is there something wrong with EV_Original.pdf, or is PdfCopy not implemented correctly?

回答1:

There are several issues here.

1) You have to enable form field merging for PdfCopy:

// ...
var copy = new PdfCopy(document, outputPdfStream);
copy.SetMergeFields();
document.Open();
// ...

This works for iText 5.4.5 (Java), but for iTextSharp Reader/Acrobat complain about an embedded font when displaying page 2 of the merged document. This is probably a porting issue.

2) EV_Original.pdf doesn't have appearances ("visualizations") for the form fields. Instead it has the NeedAppearances flag set. This indicates the PDF viewer needs to generate appearances when displaying the document.

PdfCopy doesn't process NeedAppearances correctly at the moment, so it isn't set in the output document. This needs to be fixed in iText. As a workaround, you could set NeedAppearances on your output document after merging:

PdfReader postreader = new PdfReader("combined4.pdf");
PdfStamper poststamper = new PdfStamper(postreader, new FileStream("combined4-needappearances.pdf", FileMode.Create));
poststamper.AcroFields.GenerateAppearances = true;
poststamper.Close();

But taking into account the porting bug in iTextSharp 5.4.5, I'd suggest to use PdfCopyFields until PdfCopy is fixed in the next release. Memory usage for PdfCopyFields and PdfCopy are similar when merging Acroforms. This is inherent to Acroform merging: more information need to be kept in memory. That's why it has to be explicitly enabled in PdfCopy using SetMergeFields().