Merging Tagged PDF without ruining the tags

I am trying to merge two Tagged PDF's with the iTextPDF 5.4.4 version jar. After doing all the operations while closing the document on the line: document.close();): . It throws the below error

java.lang.NullPointerException
PDF Creation Failed java.lang.NullPointerException
[B@1d5c1d5c
at com.itextpdf.text.pdf.PdfCopy.fixTaggedStructure(PdfCopy.java:878)
at com.itextpdf.text.pdf.PdfCopy.flushTaggedObjects(PdfCopy.java:799)
at com.itextpdf.text.pdf.PdfDocument.close(PdfDocument.java:836)
at com.itextpdf.text.Document.close(Document.java:416)
at PDFMerger.mergePDF(PDFMerger.java:189)

Please let me know what could be the cause of this issue.

Below is the code I use.

PdfReader reader = new PdfReader(pdf);

boolean setTagged=reader.isTagged() ; 

Document document = new Document();

PdfCopy copy = new PdfCopy(document, new FileOutputStream("Merged.pdf"));

copy.setTagged();

document.open();

int n;
n = reader.getNumberOfPages();
for (int page = 0; page < n; ) {

    copy.addPage(copy.getImportedPage(reader, ++page,true));

}
copy.freeReader(reader);
document.close();
reader.close();

标签： pdf itext itextpdf merging-data

2条回答

【Aperson】

2楼-- · 2019-03-01 16:09

This looks like a bug in the current iText versions.

@Bruno maybe someone should look into this

PdfCopy has a method fixTaggedStructure which tries to fix the tagged structure which has been somewhat garbled by copying tagged pages. Up to the current iText 5.4.6-SNAPSHOT inclusively you find the following code

PdfDictionary dict = (PdfDictionary)iobj.object;
PdfIndirectReference pg = (PdfIndirectReference)dict.get(PdfName.PG);
//if pg is real page - do nothing, else set correct pg and remove first MCID if exists
if (!pageReferences.contains(pg) && !pg.equals(currPage)){
    dict.put(PdfName.PG, currPage);
    PdfArray kids = dict.getAsArray(PdfName.K);
    if (kids != null) {
        PdfObject firstKid = kids.getDirectObject(0);
        if (firstKid.isNumber()) kids.remove(0);
    }
}

for a StructElem tagged element dict from some array. This code implicitly assumes that there is an entry for the key PdfName.PG in that dictionary dict by doing pg.equals(currPage). Unfortunately that entry is optional, e.g. the sample document provided by the OP contains such StructElem dictionaries referenced from some array without a Pg entry. This causes the NPE in question.

In this case it suffices to change the order in the equals call, i.e. instead of

if (!pageReferences.contains(pg) && !pg.equals(currPage)){

one should use

if (!pageReferences.contains(pg) && !currPage.equals(pg)){

if (pg != null && !pageReferences.contains(pg) && !pg.equals(currPage)){

depending on the actual program logic here.

@Bruno Please check which variant is semantically correct; I'm not really into this tagged structure stuff after all...

0人赞添加讨论(0) 举报

Root（大扎）

3楼-- · 2019-03-01 16:20

The Code was written in C#

  public static byte[] mergeTest(byte[] pdf) {
        PdfReader reader = null;
        Document doc = null;
        PdfCopy copy = null;
        MemoryStream stream = new MemoryStream();
        byte[] output = null;

        try {
            reader = new PdfReader(pdf);
            doc = new Document();

            copy = new PdfCopy(doc, stream);
            bool tagged = reader.IsTagged();

            if (tagged)
                copy.SetTagged();


            doc.Open();

            for (int x = 1; x <= reader.NumberOfPages; x++) {
                copy.AddPage(copy.GetImportedPage(reader, x, tagged));
            }

            copy.FreeReader(reader);
            doc.Close();
            copy.Close();

            output = stream.ToArray();

            stream.Flush();
            stream.Dispose();

        } catch (Exception ex) {

        } finally {
            try {
                if (reader != null)
                    reader.Close();
            } catch (Exception) { }
        }
        return output;
    }

0人赞添加讨论(0) 举报

Merging Tagged PDF without ruining the tags

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间