Change metadata of pdf file with pypdf

2019-01-23 17:02发布

问题:

I'd like to create/modify the title of a pdf document using pypdf. It seems that the title is readonly. Is there a way to access this metadata r/w?

If answer positive, a piece of code would be appreciated.

Thanks

回答1:

You can manipulate the title with pyPDF (sort of). I came across this post on the reportlab-users listing:

http://two.pairlist.net/pipermail/reportlab-users/2009-November/009033.html

You can also use pypdf. http://pybrary.net/pyPdf/

This won't let you edit the metadata per se, but will let you read one or more pdf file(s) and spit them back out, possibly with new metadata.

Here's the relevant code:

from pyPdf import PdfFileWriter, PdfFileReader
from pyPdf.generic import NameObject, createStringObject

OUTPUT = 'output.pdf'
INPUTS = ['test1.pdf', 'test2.pdf', 'test3.pdf']

# There is no interface through pyPDF with which to set this other then getting
# your hands dirty like so:
infoDict = output._info.getObject()
infoDict.update({
    NameObject('/Title'): createStringObject(u'title'),
    NameObject('/Author'): createStringObject(u'author'),
    NameObject('/Subject'): createStringObject(u'subject'),
    NameObject('/Creator'): createStringObject(u'a script')
})

inputs = [PdfFileReader(i) for i in INPUTS]
for input in inputs:
    for page in range(input.getNumPages()):
        output.addPage(input.getPage(page))

outputStream = file(OUTPUT, 'wb')
output.write(outputStream)
outputStream.close()