I am using iTextSharp library and C#.Net for splitting my PDF file.
Consider a PDF file named sample.pdf containing 72 pages. This sample.pdf contains pages that have hyperlink that navigate to other page. Eg: In the page 4 there are three hyperlinks which when clicked navigates to corresponding 24th,27th,28th page. As same as the 4th page there are nearly 12 pages that is having this hyperlinks with them.
Now using iTextSharp library I had split this PDF pages into 72 separate file and saved with the name as 1.pdf,2.pdf....72.pdf. So in the 4.pdf when clicking that hyperlinks I need to make the PDF navigate to 24.pdf,27.pdf,28.pdf.
Please help me out here. How can I edit and set the hyperlinks in the 4.pdf so that it navigates to corresponding pdf files.
Thank you,
Ashok
What you want is quite possible. What you want will require you to work with the low-level PDF objects (PdfDictionary, PdfArray, etc).
And whenever someone needs to work with those objects, I always refer them to the PDF Reference. In your case, you'll want to examine chapter 7 (particularly section 3) and chapter 12, sections 3 (doc-level navigation) and 5 (annotations).
Assuming you've read that, here's what you need to do:
- Step through the annotation array of each page (in the original doc, before breaking it up).
- Find all the link annotations & their destinations.
- Build a new destination for that link corresponding to the new file.
- write that new destination into the link annotation.
- Write this page into a new PDF using PdfCopy (it'll copy the annotations as well as the page content).
Step 1.1 isn't simple. There are several different kinds of "local goto" annotation formats. You need to determine which page a given link points to. Some links might say the PDF equivalent of "next page" or "previous page", while others will include a reference to a particular page. This will be an "indirect object reference", not a page number.
To determine the page number from a page reference, you need to... ouch. Okay. The most efficient way would be to call PdfReader.GetPageRef(int pageNum) for each page in the original document and cache it in a map (reference->pageNum).
You can then build "remote goto" links by creating a remote goto PdfAction, and writing that into the link annotation's "A" (action) entry, removing anything that was there before (probably a "Dest").
I don't speak C# very well, so I'll leave the actual implementation to you.
Alright, based on what @Mark Storer here's some starter code. The first method creates a sample PDF with 10 pages and some links on the first page that jump around to different parts of the PDF so we have something to work with. The second methods opens the PDF created in the first method and walks through each annotation trying to figure out which page the annotation links to and outputs it to the TRACE window. The code is in VB but should be easily converted to C# if needed. Its targetting iTextSharp 5.1.1.0.
If I get a chance I might try to take this further and actually split and re-link things but I don't have time right now.
Option Explicit On
Option Strict On
Imports iTextSharp.text
Imports iTextSharp.text.pdf
Imports System.IO
Public Class Form1
''//Folder that we are working in
Private Shared ReadOnly WorkingFolder As String = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Hyperlinked PDFs")
''//Sample PDF
Private Shared ReadOnly BaseFile As String = Path.Combine(WorkingFolder, "Sample.pdf")
Private Shared Sub CreateSamplePdf()
''//Create our output directory if it does not exist
Directory.CreateDirectory(WorkingFolder)
''//Create our sample PDF
Using Doc As New iTextSharp.text.Document(PageSize.LETTER)
Using FS As New FileStream(BaseFile, FileMode.Create, FileAccess.Write, FileShare.Read)
Using writer = PdfWriter.GetInstance(Doc, FS)
Doc.Open()
''//Turn our hyperlinks blue
Dim BlueFont As Font = FontFactory.GetFont("Arial", 12, iTextSharp.text.Font.NORMAL, iTextSharp.text.BaseColor.BLUE)
''//Create 10 pages with simple labels on them
For I = 1 To 10
Doc.NewPage()
Doc.Add(New Paragraph(String.Format("Page {0}", I)))
''//On the first page add some links
If I = 1 Then
''//Go to pages relative to this page
Doc.Add(New Paragraph(New Chunk("First Page", BlueFont).SetAction(New PdfAction(PdfAction.FIRSTPAGE))))
Doc.Add(New Paragraph(New Chunk("Next Page", BlueFont).SetAction(New PdfAction(PdfAction.NEXTPAGE))))
Doc.Add(New Paragraph(New Chunk("Prev Page", BlueFont).SetAction(New PdfAction(PdfAction.PREVPAGE)))) ''//This one does not make sense but is here for completeness
Doc.Add(New Paragraph(New Chunk("Last Page", BlueFont).SetAction(New PdfAction(PdfAction.LASTPAGE))))
''//Go to a specific hard-coded page number
Doc.Add(New Paragraph(New Chunk("Go to page 5", BlueFont).SetAction(PdfAction.GotoLocalPage(5, New PdfDestination(0), writer))))
End If
Next
Doc.Close()
End Using
End Using
End Using
End Sub
Private Shared Sub ListPdfLinks()
''//Setup some variables to be used later
Dim R As PdfReader
Dim PageCount As Integer
Dim PageDictionary As PdfDictionary
Dim Annots As PdfArray
''//Open our reader
R = New PdfReader(BaseFile)
''//Get the page cont
PageCount = R.NumberOfPages
''//Loop through each page
For I = 1 To PageCount
''//Get the current page
PageDictionary = R.GetPageN(I)
''//Get all of the annotations for the current page
Annots = PageDictionary.GetAsArray(PdfName.ANNOTS)
''//Make sure we have something
If (Annots Is Nothing) OrElse (Annots.Length = 0) Then Continue For
''//Loop through each annotation
For Each A In Annots.ArrayList
''//I do not completely understand this but I think this turns an Indirect Reference into an actual object, but I could be wrong
''//Anyway, convert the itext-specific object as a generic PDF object
Dim AnnotationDictionary = DirectCast(PdfReader.GetPdfObject(A), PdfDictionary)
''//Make sure this annotation has a link
If Not AnnotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK) Then Continue For
''//Make sure this annotation has an ACTION
If AnnotationDictionary.Get(PdfName.A) Is Nothing Then Continue For
''//Get the ACTION for the current annotation
Dim AnnotationAction = DirectCast(AnnotationDictionary.Get(PdfName.A), PdfDictionary)
''//Test if it is a named actions such as /FIRST, /LAST, etc
If AnnotationAction.Get(PdfName.S).Equals(PdfName.NAMED) Then
Trace.Write("GOTO:")
If AnnotationAction.Get(PdfName.N).Equals(PdfName.FIRSTPAGE) Then
Trace.WriteLine(1)
ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.NEXTPAGE) Then
Trace.WriteLine(Math.Min(I + 1, PageCount)) ''//Any links that go past the end of the document should just go to the last page
ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.LASTPAGE) Then
Trace.WriteLine(PageCount)
ElseIf AnnotationAction.Get(PdfName.N).Equals(PdfName.PREVPAGE) Then
Trace.WriteLine(Math.Max(I - 1, 1)) ''//Any links the go before the first page should just go to the first page
End If
''//Otherwise see if its a GOTO page action
ElseIf AnnotationAction.Get(PdfName.S).Equals(PdfName.GOTO) Then
''//Make sure that it has a destination
If AnnotationAction.GetAsArray(PdfName.D) Is Nothing Then Continue For
''//Once again, not completely sure if this is the best route but the ACTION has a sub DESTINATION object that is an Indirect Reference.
''//The code below gets that IR, asks the PdfReader to convert it to an actual page and then loop through all of the pages
''//to see which page the IR points to. Very inneficient but I could not find a way to get the page number based on the IR.
''//AnnotationAction.GetAsArray(PdfName.D) gets the destination
''//AnnotationAction.GetAsArray(PdfName.D).ArrayList(0) get the indirect reference part of the destination (.ArrayList(1) has fitting options)
''//DirectCast(AnnotationAction.GetAsArray(PdfName.D).ArrayList(0), PRIndirectReference) turns it into a PRIndirectReference
''//The full line gets us an actual page object (actually I think it could be any type of pdf object but I have not tested that).
''//BIG NOTE: This line really should have a bunch more sanity checks in place
Dim AnnotationReferencedPage = PdfReader.GetPdfObject(DirectCast(AnnotationAction.GetAsArray(PdfName.D).ArrayList(0), PRIndirectReference))
Trace.Write("GOTO:")
''//Re-loop through all of the pages in the main document comparing them to this page
For J = 1 To PageCount
If AnnotationReferencedPage.Equals(R.GetPageN(J)) Then
Trace.WriteLine(J)
Exit For
End If
Next
End If
Next
Next
End Sub
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
CreateSamplePdf()
ListPdfLinks()
Me.Close()
End Sub
End Class
This function below uses iTextSharp to:
- Open the PDF
- Page throught he PDF
- Inspect the annotations on each page for those that are ANCHORS
Step #4 is to insert whatever logic you want in here... update the links, log them, etc.
/// <summary>Inspects PDF files for internal links.
/// </summary>
public static void FindPdfDocsWithInternalLinks()
{
foreach (var fi in PdfFiles) {
try {
var reader = new PdfReader(fi.FullName);
// Pagination
for(var i = 1; i <= reader.NumberOfPages; i++) {
var pageDict = reader.GetPageN(i);
var annotArray = (PdfArray)PdfReader.GetPdfObject(pageDict.Get(PdfName.ANNOTS));
if (annotArray == null) continue;
if (annotArray.Length <= 0) continue;
// check every annotation on the page
foreach (var annot in annotArray.ArrayList) {
var annotDict = (PdfDictionary)PdfReader.GetPdfObject(annot);
if (annotDict == null) continue;
var subtype = annotDict.Get(PdfName.SUBTYPE).ToString();
if (subtype != "/Link") continue;
var linkDict = (PdfDictionary)annotDict.GetDirectObject(PdfName.A);
if (linkDict == null) continue;
// if it makes it this far, its an Anchor annotation
// so we can grab it's URI
var sUri = linkDict.Get(PdfName.URI).ToString();
if (String.IsNullOrEmpty(sUri)) continue;
}
}
reader.Close();
}
catch (InvalidPdfException e)
{
if (!fi.FullName.Contains("_vti_cnf"))
Console.WriteLine("\r\nInvalid PDF Exception\r\nFilename: " + fi.FullName + "\r\nException:\r\n" + e);
continue;
}
catch (NullReferenceException e)
{
if (!fi.FullName.Contains("_vti_cnf"))
Console.WriteLine("\r\nNull Reference Exception\r\nFilename: " + fi.Name + "\r\nException:\r\n" + e);
continue;
}
}
// DO WHATEVER YOU WANT HERE
}
Good luck.