I have some PDF's all with two attached files with static names. I would like to use iTextSharp to extract these files to a temp directory so that I can work with them further. I tried following the tutorial here but I ran into problems when the iTextSharp.text.pdf.PdfReader
didn't have a getCatalog()
method as shown in the bottom example.
Any advice on how I can extract the attachments? Let's just say for ease that the PDF document is at "C:\test.pdf" and the two attachments are stored as "attach1.xml" and "attach2.xml".
I ended up finding a way to do this - although not exactly programmatically. I included a binary called "pdftk.exe" which is PDF ToolKit, which has command-line options to extract the attachments.
To clarify, I added pdftk.exe, then called it via
Process.Start("./pdftk", "contains_attachments.pdf unpack_files output \"C:\\output_directory\"")
. Note that pdftk will not output to a folder with a trailing backslash. You can find pdftk here: http://www.accesspdf.com/pdftk/After adding the .exe file to the project, you need to set its properties to "Copy always" or "Copy if newer".
I found this solution. I don't know if it's the best way, but, it work!!