Reading PDF Bookmarks in VB.NET using iTextSharp

2020-03-30 01:42发布

问题:

I am making a tool that scans PDF files and searches for text in PDF bookmarks and body text. I am using Visual Studio 2008 with VB.NET with iTextSharp.

How do I load bookmarks' list from an existing PDF file?

回答1:

It depends on what you understand when you say "bookmarks".

You want the outlines (the entries that are visible in the bookmarks panel):

The CreateOnlineTree examples shows you how to use the SimpleBookmark class to create an XML file containing the complete outline tree (in PDF jargon, bookmarks are called outlines).

Java:

PdfReader reader = new PdfReader(src);
List<HashMap<String, Object>> list = SimpleBookmark.getBookmark(reader);
SimpleBookmark.exportToXML(list,
        new FileOutputStream(dest), "ISO8859-1", true);
reader.close();

C#:

PdfReader reader = new PdfReader(pdfIn);
var list = SimpleBookmark.GetBookmark(reader);
using (MemoryStream ms = new MemoryStream()) {
    SimpleBookmark.ExportToXML(list, ms, "ISO8859-1", true); 
    ms.Position = 0;
    using (StreamReader sr =  new StreamReader(ms)) {
        return sr.ReadToEnd();
    }              
} 

The list object can also be used to examine the different bookmark elements one by one programmatically (this is all explained in the official documentation).

You want the named destinations (specific places in the document you can link to by name):

Now suppose that you meant to say named destinations, then you need the SimpleNamedDestination class as shown in the LinkActions example:

Java:

PdfReader reader = new PdfReader(src);
HashMap<String,String> map = SimpleNamedDestination.getNamedDestination(reader, false);
SimpleNamedDestination.exportToXML(map, new FileOutputStream(dest),
        "ISO8859-1", true);
reader.close();

C#:

PdfReader reader = new PdfReader(src);
Dictionary<string,string> map = SimpleNamedDestination
      .GetNamedDestination(reader, false);
using (MemoryStream ms = new MemoryStream()) {
    SimpleNamedDestination.ExportToXML(map, ms, "ISO8859-1", true);
    ms.Position = 0;
    using (StreamReader sr =  new StreamReader(ms)) {
      return sr.ReadToEnd();
    }
}

The map object can also be used to examine the different named destinations one by one programmatically. Note the Boolean parameter that is used when retrieving the named destinations. Named destinations can be stored using a PDF name object as name, or using a PDF string object. The Boolean parameter indicates whether you want the former (true = stored as PDF name objects) or the latter (false = stored as PDF string objects) type of named destinations.

Named destinations are predefined targets in a PDF file that can be found through their name. Although the official name is named destinations, some people refer to them as bookmarks too (but when we say bookmarks in the context of PDF, we usually want to refer to outlines).



回答2:

If someone is still searching the vb.net solution, trying to simplify, I have a large amount of pdf created with reportbuilder and with documentmap I automatically add a bookmarks "Title". So with iTextSharp I read the pdf and extract just the first bookmark value:

    Dim oReader As New iTextSharp.text.pdf.PdfReader(PdfFileName)
    Dim list As Object
    list = SimpleBookmark.GetBookmark(oReader)
    Dim string_book As String
    string_book = list(0).item("Title")

It is a little help very simple for someone searching a start point to understand how it works.