Get the page number from document outline (bookmar

2019-06-13 13:31发布

问题:

I am using the itext7 library to manipulate some existing PDFs. For some reason, I am not able to get the page number from the outline. I guess I somehow should get it from the PdfDestination but cannot find any matching methods in any of its subclasses.

PdfDocument pdfDoc =  new PdfDocument(new PdfReader("example.pdf"));
var root = pdfDoc.GetOutlines(false);
foreach (PdfOutline ol in root.GetAllChildren()) {
    Console.WriteLine(ol.GetTitle());
    PdfDestination d =  ol.GetDestination();
    // how to get the page number from the destination object
}

In iText5 I used the SimpleBookmark.GetBookmark(reader) that returned a list of dictionaries containing a "Page" entry - but this functionality seems to have been removed in iText7.

Edit: I had a look at the Net implementation of PdfExplicitDestination.getDestinationPage() on Github (same for java. I don't understand the purpose of the parameters to this method. If I pass in null it seems to work on pdfs that only use one level in the outline hierarchy using ToString(). By working I mean that it returns the zero-indexed page number as a string. For PDF the code it does not find the page number (neither for the first level).

PdfDocument pdfDoc =  new PdfDocument(new PdfReader("example.pdf"));
var root = pdfDoc.GetOutlines();
foreach (PdfOutline ol in root.GetAllChildren()) {
    Console.WriteLine(ol.GetTitle());
    var d = ol.GetDestination();
    if (d is PdfExplicitDestination) {
        string PageNoStr = d.GetDestinationPage(null).ToString();               
        // this is the content of the method (less the ToString()
        //string PageNoStr = ((PdfArray)d.GetPdfObject()).Get(0).ToString();
        int pageNo;
        if (Int32.TryParse(PageNoStr, out pageNo)) {
            Console.WriteLine("Page is " + pageNo);
        } else  {
            Console.WriteLine("Error page");
        }    
    }
}

So I am still trying to figure this out.

回答1:

Regarding the levels of the outline hierarchy, in order to traverse the whole hierarchy you will have to check for each PdfOutline's children and traverse them recursively.

The names parameter that was confusing to you is the parameter that is responsible for resolving named destinations which is necessary to get the page numbers correctly in general case because your PDF document may contains explicit as well as named destinations. To get the names map you can use pdfDocument.getCatalog().getNameTree(PdfName.Dests).getNames();

To find the page number by a page object, you should use pdfDocument.getPageNumber(PdfDictionary).

Overall, the method walking through the outlines may look as following:

void walkOutlines(PdfOutline outline, Map<String, PdfObject> names, PdfDocument pdfDocument) {
    if (outline.getDestination() != null) {
        System.out.println(outline.getTitle() + ": page " +
                pdfDocument.getPageNumber((PdfDictionary) outline.getDestination().getDestinationPage(names)));
    }
    for (PdfOutline child : outline.getAllChildren()) {
        walkOutlines(child, names, pdfDocument);
    }
}

And the main entry point to call the method to traverse the outline root:

PdfNameTree destsTree = pdfDocument.getCatalog().getNameTree(PdfName.Dests);
PdfOutline root = pdfDocument.getOutlines(false);
walkOutlines(root, destsTree.getNames(), pdfDocument);

Please note that the code sample is for Java, but it should be similar in C# except some case changes and IDictionary instead if Map.



标签: c# itext itext7