Manipulating PDF file

I would like to read a PDF file as a text (postscript), add new objects in the file structure and save the final output as a new PDF but If I just copied the PDF PostScript content and paste it in a newly created PDF file (where encoding='ansi'), the file doesn't work.

I am sure that this may be encoding issue but I am not sure what I should do to have a valid PDF file format after manipulating the original PostScript content.

Here is the piece of code that didn't work with me:

pdf_file = open('Input.pdf', 'r', encoding='ansi').read()
pdf_file_bytes = bytearray(pdf_file, 'ansi')
pdf_file = open('Output_bytes.pdf', 'wb').write(pdf_file_bytes)

And as I said, the output PDF is not valid!

标签： pdf ansi pdf-manipulation

2条回答

叼着烟拽天下

2楼-- · 2019-08-21 04:41

A PDF file is a complex file format consisting of various objects, unless you under low-level syntax of the PDF specification carefully it will be difficult to impossible to arbitrarily replace some bytes with some other bytes and have it result in a still valid PDF file.

More to the point what are you trying to accomplish. E.g. there may be a high-level way of doing whatever you're trying to do that doesn't involve manipulating PDF syntax directly. E.g. if you need to modify a font, add an annotation, set the PDF version, etc. Otherwise if you actually need to modify PDF syntax you need to use a library capable of dealing with low-level objects.

0人赞添加讨论(0) 举报

Luminary・发光体

3楼-- · 2019-08-21 04:55

First problem; the content of a PDF file is PDF, not PostScript.

Secondly, PDF is a binary file foramt so if you copy and paste it any kind of translation (such as CR/LF) will break it.

You haven't said what programming language your code uses, though it looks like Python. If it is Python then reading the file as binary instead of text might help.

0人赞添加讨论(0) 举报

Manipulating PDF file

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间