Merge PDF's with PDFTK with Bookmarks?

2019-01-16 05:01发布

站内文章 / Linux

108 0

爷的心禁止访问

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Using pdftk to merge multiple pdf's is working well. However, any easy way to make a bookmark for each pdf merged?

I don't see anything on the pdftk docs regarding this so I don't think it's possible with pdftk.

All of our files merged will be 1 page, so wondering if there's any other utility that can add in bookmarks afterwards?

Or another linux based pdf utility that will allow to merge while specifying a bookmark for each individual pdf.

回答1:

You can also merge multiple PDFs with Ghostscript. The big advantage of this route is that a solution is easily scriptable, and it does not require a real programming effort:

gswin32c.exe ^
          -dBATCH -dNOPAUSE ^
          -sDEVICE=pdfwrite ^
          -sOutputFile=merged.pdf ^
          [...more Ghostscript options as needed...] ^
          input1.pdf input2.pdf input3.pdf [....]

With Ghostscript you'll be able to pass pdfmark statements which can add a Table of Content as well as bookmarks for each additional source file going into the resulting PDF. For example:

gswin32c.exe ^
          -dBATCH -dNOPAUSE ^
          -sDEVICE=pdfwrite ^
          -sOutputFile=merged.pdf ^
          [...more Ghostscript options as needed...] ^
          file-with-pdfmarks-to-generate-a-ToC.ps ^
          -f input1.pdf input2.pdf input3.pdf [....]

gswin32c.exe ^
          -dBATCH -dNOPAUSE ^
          -sDEVICE=pdfwrite ^
          -sOutputFile=merged.pdf ^
          [...more Ghostscript options as needed...] ^
          file-with-pdfmarks-to-generate-a-ToC.ps ^
          -f input1.pdf ^
             input2.pdf ^ 
             input3.pdf [....]

For some introduction to the pdfmark topic, see also Thomas Merz's PDFmark Primer.

Edit:
I had wanted to give you an example for file-with-pdfmarks-to-generate-a-ToC.ps, but somehow forgot it. Here it is:

[/Page 1 /View [/XYZ null null null] /Title (File 1) /OUT pdfmark
[/Page 2 /View [/XYZ null null null] /Title (File 2) /OUT pdfmark
[/Page 3 /View [/XYZ null null null] /Title (File 3) /OUT pdfmark
[/Page 4 /View [/XYZ null null null] /Title (File 4) /OUT pdfmark

This would create a ToC for the first 4 files == first 4 pages (since you guarantee your ingredient files are 1 page each for your merged output PDF).

The [/XYZ null null null] part makes sure your page viewport and zoom level does not change from the current one when you follow the link. (You could say [/XYZ 222 111 2] to do this, if you want an arbitrary example.)
The /Title (some string you want) thingie determines what text is in the ToC.

And, you could even add these parameters to the Ghostscript commandline directly:

gswin32c.exe ^
       -o merged.pdf ^
       [...more Ghostscript options as needed...] ^
       -c "[/Page 1 /View [/XYZ null null null] /Title (File 1) /OUT pdfmark" ^
       -c "[/Page 2 /View [/XYZ null null null] /Title (File 2) /OUT pdfmark" ^
       -c "[/Page 3 /View [/XYZ null null null] /Title (File 3) /OUT pdfmark" ^
       -c "[/Page 4 /View [/XYZ null null null] /Title (File 4) /OUT pdfmark" ^
       -f input1.pdf ^
          input2.pdf ^ 
          input3.pdf ^ 
          input4.pdf [....]

'nother Edit:

Oh, and by the way: Ghostscript does preserve the bookmarks when you use it to merge two PDF files into one -- pdftk.exe does not. Let's use the one generated by the command of my first edit (effectively concatenating 2 copies of the same file):

 gswin32c ^
    -sDEVICE=pdfwrite ^
    -o doublemerged.pdf ^
     merged.pdf ^
     merged.pdf

The file doublemerged.pdf will now have 2*4 = 8 bookmarks.

What's as expected: bookmarks 1, 2, 3, and 4 link to pages 1, 2, 3 and 4.
The problem is, that bookmarks 5, 6, 7 and 8 also link at pages 1, 2, 3 and 4.

The reason is, that the pre-existing bookmarks did address their link targets by absolute page numbers. To work around that (and bookmarks work in merged files), one would have to generate bookmarks which do point to link targets by named destinations (and make sure these are uniq across documents which are merged).

(This approach also works on linux, just use gs instead of gswin32c.)

Appendix

Above command line uses [...more Ghostscript options as needed...] as a place holder for more options.

If you do not use other options, Ghostscript will apply its built-in defaults for various parameters. However, this may give you results which may not to your liking. Since Ghostscript generates a completely new PDF based on the input, this means that some of the original objects may be changed. This is true for color spaces and for image compression levels.

How to apply parameters which leave the originally embedded images unchanged can be seen over at SuperUser: "Use Ghostscript, but tell it to not reprocess images".

回答2:

I know there are other ways to do this already mentioned, but with pdftk you can take the merged pdf and add bookmarks to it by using the pdftk function dump_data to create a .info file of the existing info in the pdf. Then you can add bookmark info to the .info file by add the following four lines for each bookmark

BookmarkBegin
BookmarkTitle: name
BookmarkLevel: level
BookmarkPageNumber: page number

Then use the update_info call to update the merged pdf bookmarks with the ones you wrote to the .info file. I have written some simple functions that do this for me in autohotkey if anyone is interested. See http://www.autohotkey.com/board/topic/98985-scripts-to-merge-pdfs-and-add-bookmarks-with-pdftk/

回答3:

See this answer at https://stackoverflow.com/a/17781138/547578. I used something called Sejda. It works. It combines the bookmarks perfectly. Thanks @blablatros.

回答4:

Too add or edit pdf bookmarks you could use JPdfBookmarks. It is an excellent multi-OS Free Software tool that I have been using for a while now with excellent results. It deals with bookmarks only though, so you would need another tool to merge or reorder pages. In addition to pdftk I suggest trying PDF Split and Merge (good app, but weird UI, messes up bookmarks from my experience), PDF-Shuffler (seems to work fine, but sometimes freezes while dealing with some files), or PdfMod (the best potentially as it deals with rearranging, merging and dealing with bookmarks, although I have not been able to figure out how to add pdfs into a specific page).

Sorry for not providing some links, as a newbie the system only allows me to add 2 hyperlinks.

回答5:

@pipitas 's good answer doesn't solve the bookmark issues perfected, and the there is related question in unix discussion https://unix.stackexchange.com/questions/17065/add-and-edit-bookmarks-to-pdf/31070 , where I suggest

If you still stick with those unix scripts, then

extract bookmark data dumped from pdftk
write one extra script to convert dumped bookmark data to pdfmarks format, which ghostscript command gs is accepted.
use gs script to merge them together with pdfmarks

The script exist already, see pdf-merge.py from Merge PDF's with PDFTK with Bookmarks?

回答6:

Maybe the following is helpful. I wanted to merge all pdfs (in_nn.pdf) located in one directory to one out.pdf which has the names of input pdfs (in_nn) as ToC. I wrote a python script which reads the names and extracts the page numbers and generates a file named pdfmarks. Merging the files is then easily done with gs. The exact command is output by the script and must be executed separately (maybe with some modifications due to page size adaptions or due to the operating system).

Here it is. Perhaps some modifications are necessary for windows? (sorry for comments not in english). Just execute the python script in the directory where the pdfs to be merged are located.

#!/usr/bin/env python

import subprocess

# Dieses Skript dient dazu, eine Reihe von pdfs zu einem einzigen pdf zusammenzufassen und bookmarks fuer diese pdf-Datei zu erzeugen.
# Dafuer wird ein Datei pdfmark benoetigt, die mit diesem Skript erzeugt wird.
# Dazu einfach dieses Skript in dem Verzeichnis aufrufen, das genau alle zusammenzufassenden pdfs (*pdf, s.u.) enthaelt.
# Das zusammenfassende pdf wird dann mit diesem Befehl (in der bash) generiert:
# gs -dBATCH -dNOPAUSE -sPAPERSIZE=A4 -sDEVICE=pdfwrite -sOutputFile="all.pdf" $(ls *pdf ) pdfmarks
# Bereits Inhaltsverzeichnisse bleiben erhalten, die neuen kommen ans Ende des Inhaltsverzeichnisses.
#
# pdfmarks sieht dabei prinzipiell so aus:
#
# [/Title (Nr. 1) /Page 1 /OUT pdfmark
# [/Title (Nr. 2) /Page 5 /OUT pdfmark
# [/Title (Nr. 3) /Page 9 /OUT pdfmark
# usw.

p = subprocess.Popen('ls *pdf', shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

pdfdateien = []
kombinationen = []

for line in p.stdout.readlines():
# p enthaelt alle pdf-Dateinamen
  pdfdateien.append(line)


for datei in pdfdateien:
  cmd = "pdfinfo %s" %datei 
  q=subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
  kombination = [datei]

for line in p.stdout.readlines():
# p enthaelt alle pdf-Dateinamen
  pdfdateien.append(line)


for datei in pdfdateien:
  cmd = "pdfinfo %s" %datei 
  q=subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
  kombination = [datei]


  for subline in q.stdout.readlines():
# q enthaelt die Zeilen von pdfinfo
    if "Pages" in subline:
      kombination.append(subline)

  kombinationen.append(kombination)


# Jetzt kombinationen in benoetigtes Format bringen:

kombinationen_bereinigt =  []
out_string1 = "[/Title ("
out_string2 = ") /Page "
out_string3 = " /OUT pdfmark\n"
seitenzahl = 1

for kombination in kombinationen:
  dateiname = kombination[0][0:len(kombination[0])-5]

#
# Hier noch dateiname evtl. verwursten
# z. B.
#  lesezeichen = dateiname[0:1]+" "+dateiname[6:8]+"/"+dateiname[1:5]
  lesezeichen = dateiname

  anz_seiten = kombination[1][16:len(kombination[1])-1]
  seitenzahl_str = str(seitenzahl)

  kombination_bereinigt = out_string1+lesezeichen+out_string2+seitenzahl_str+out_string3
  kombinationen_bereinigt.append(kombination_bereinigt)

  seitenzahl += int(anz_seiten)


# Ausgabe ins file
outfile = open("pdfmarks", "w")

for i in kombinationen_bereinigt:
  outfile.write(i)

outfile.close()

# Merge-Befehl absetzen

print "\nFor merging all pdfs execute this (or similar) command (in bash shell):"
print "gs -dBATCH -dNOPAUSE -sPAPERSIZE=A4 -sDEVICE=pdfwrite -sOutputFile=\"all.pdf\" $(ls *pdf ) pdfmarks\n"

回答7:

Unfortunately there is no easy way to do that. You could use the library that pdftk is built upon directly and either write a Java or a .NET program that uses iText or iTextSharp to merge your one-pagers and create the bookmarks. If you want to go the iText route, there are lot of examples available online or in the iText book (written by the iText author).

... or, let me know what's not working and I can help.

回答8:

The following is intended to be a comment to the answer by pdfmerger (https://stackoverflow.com/a/30524828/3915004).

Thanks for your script pdfmerger! I know the question is marked linux, but to generalize your script for Mac OS X, 2 things are needed:

ghostscript gs and
the command pdfinfo (which is included e.g. in poppler)

Install them by getting first brew (google it, it is installed via some curl/ruby-magic command ^^ ) and then simply:

brew install ghostscript
brew install poppler

ADD-ON: READ TEXT-FILE WITH CHAPTER TITLES:

To expand on your script. I use this workflow mainly for books available as chapter-downloads from the editors website. A textfile containing the chapter names can easily be generated. The following add-on to your code reads additionally a textfile 'chapters.txt' containing one line per pdf to merge. (Note, I didn't implement any check on the number of lines corresponding to the number of pdfs.)

Simply expand your script by replacing the following lines:

p = subprocess.Popen('ls *pdf', shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
c = subprocess.Popen('less chapters.txt', shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

pdfdateien = []
kombinationen = []
chapternames = []

for line in c.stdout.readlines():
# c contains all chapter-titles
  chapternames.append(line)

for line in p.stdout.readlines():

and

for index, kombination in enumerate(kombinationen):
#  dateiname = kombination[0][0:len(kombination[0])-5]
#
# Hier noch dateiname evtl. verwursten
# z. B.
#  lesezeichen = dateiname[0:1]+" "+dateiname[6:8]+"/"+dateiname[1:5]
#  lesezeichen = dateiname
  lesezeichen=chapternames[index][:-1]

  anz_seiten = kombination[1][16:len(kombination[1])-1]

回答9:

There is PdfMod. It has a graphical interface and it let you add bookmarks manually. Also if you edit a PDF that already comes with bookmarks, it will update them automatically to point to the correct pages.

回答10:

Sejda PDF (which was suggested in one of the answers) is also available as an online service: https://www.sejda.com/merge-pdf.

This might come in handy if you don't want to install any additional software and prefer working online from a browser.

Steps to merge:

Drag and drop all PDF files to the web page
By default all existing bookmarks are preserved and will work in the merged document as well.
Optionally, the merge tool can build a table of contents based on the PDF documents being combined

The online service to merge PDF files is free to use for up to 30 files per hour and files up to 50Mb/200 pages.

Disclaimer: I'm an open source dev working on Sejda.

回答11:

The recent version of pdftk (at least v2.02) handles bookmarks and links correctly:

pdftk file1.pdf file2.pdf cat output merged.pdf

标签： linux pdf pdf-generation pdftk ghostscriptsharp

爷的心禁止访问

女 | 书童

私信

收藏的人(0)

Ta的文章更多文章

0条评论

还没有人评论过~