How to read PDF bookmarks programmatically

2019-02-21 13:03发布

I'm using a PDF converter to access the graphical data within a PDF. Everything works fine, except that I don't get a list of the bookmarks. Is there a command-line app or a C# component that can read a PDF's bookmarks? I found the iText and SharpPDF libraries and I'm currently looking through them. Have you ever done such a thing?

4条回答
\"骚年 ilove
2楼-- · 2019-02-21 13:23

Try the following code

PdfReader pdfReader = new PdfReader(filename);

IList<Dictionary<string, object>> bookmarks = SimpleBookmark.GetBookmark(pdfReader);

for(int i=0;i<bookmarks.Count;i++)
{
    MessageBox.Show(bookmarks[i].Values.ToArray().GetValue(0).ToString());

    if (bookmarks[i].Count > 3)
    {
        MessageBox.Show(bookmarks[i].ToList().Count.ToString());
    }
}

Note: Don't forget to add iTextSharp DLL to your project.

查看更多
你好瞎i
3楼-- · 2019-02-21 13:25

If a commercial library is an option for you you could give Amyuni PDF Creator .Net a try.

Use the class Amyuni.PDFCreator.IacDocument.RootBookmark to retrieve the root of the bookmarks' tree, then the properties in IacBookmark to access each tree element, to navigate through the tree, and to add, edit or remove elements if needed.

Usual disclaimer applies

查看更多
祖国的老花朵
4楼-- · 2019-02-21 13:33

You can use the PDFsharp library. It is published under the MIT License so it can be used even in corporate development. Here is an untested example.

using PdfSharp.Pdf;

using (PdfDocument document = PdfReader.IO.Open("bookmarked.pdf", IO.PdfDocumentOpenMode.Import))
{
    PdfDictionary outline = document.Internals.Catalog.Elements.GetDictionary("/Outlines");
    PrintBookmark(outline);
}

void PrintBookmark(PdfDictionary bookmark)
{
    Console.WriteLine(bookmark.Elements.GetString("/Title"));
    for (PdfDictionary child = bookmark.Elements.GetDictionary("/First"); child != null; child = child.Elements.GetDictionary("/Next"))
    {
        PrintBookmark(child);
    }
}

Gotchas:

  • PdfSharp doesn't support open pdf's over version 1.6 very well. (throws: cannot handle iref streams. the current implementation of pdfsharp cannot handle this pdf feature introduced with acrobat 6)
  • There are many types of strings in PDFs which PDFsharp returns as is including UTF-16BE strings. (7.9.2.1 ISO32000 2008)
查看更多
可以哭但决不认输i
5楼-- · 2019-02-21 13:37

You might try Docotic.Pdf library for the task if you are fine with a commercial solution.

Here is a sample code to list all top-level items from bookmarks with some of their properties.

using (PdfDocument doc = new PdfDocument("file.pdf"))
{
    PdfOutlineItem root = doc.OutlineRoot;
    foreach (PdfOutlineItem item in root.Children)
    {
        Console.WriteLine("{0} ({1} child nodes, points to page {2})",
            item.Title, item.ChildCount, item.PageIndex);
    }
}

PdfOutlineItem class also provides properties related to outline item styles and more.

Disclaimer: I work for the vendor of the library.

查看更多
登录 后发表回答