I want to merge several PDF documents into one. The source documents can consist of PDFs created by me and others created by other organisations. I have no control over the permissions attached to documents not created by me. Some of these documents (those not created by me) may have permissions set. If a document requires a password to open it I do not attempt to merge it.
I am using iText 5.5.1 (I think that is the latest) to create a PDFCopy object to contain the resulting document and a reader for each source PDF in a loop (I am passing a list of the documents to be merged). I check each document for the number of pages and then using the PDFCopy object import each page and then add it to the PDFCopy object (the reason these two steps are separate is due to the intricacies of the language I am using to work with the java objects, RPG on an IBM iSeries). The problem is I can attach a reader to a PDF with permissions and get the page count, but as soon as I try to import a page into the copy object the program complains and terminates with the message 'PdfReader not opened with owner password'. I am not able to get the person(s) providing the documents from other organisations to not protect the documents (there a very, very good reasons why the original document is protected from change) but I need to consolidate these documents into one.
My question is, can I copy PDF's with permissions into a new document using iText and can I do it without knowing the owner password? In addition to that I guess the other question would be, is it legal?
Thanks
GarryM
Introduction: A PDF file can be encrypted using a public certificate. If you have such a PDF, you need the corresponding private certificate to decrypt it. A PDF file can be encrypted using two passwords: a user password and an owner password. If the PDF is encrypted using a user password, you need at least one of the two passwords to decrypt it.
Assumption: I assume that the PDFs are encrypted with nothing but an owner password. You can open these documents in a PDF viewer without having to provide a user password, which means the content can be accessed, but there are some restrictions in place depending on the permissions that are set.
Situation: iText is a library that allows you to access PDFs at a very low level, without a GUI. It can easily access a PDF that is encrypted with nothing but an owner password, but it can't check if you respect the permissions that are defined for the PDF. To make sure that you are aware of your responsibilities, an exception is thrown saying PdfReader not opened with owner password. This is often too strict: sometimes you have the permission to assemble a PDF file, but with iText it's all or nothing. Either you can open the file, or you can't. iText doesn't check what you're doing afterwards.
Solution: There is a static Boolean parameter called unethicalreading
that is set to false
by default. You can change it like this:
PdfReader.unethicalreading = true;
From now on, it will be as if the PDFs aren't encrypted.
Is this legal? It's not that clear and I am not a lawyer, but:
It used to be illegal when Adobe still owner the copyright on the PDF specification. Adobe granted the right to use that copyright to any developer on certain conditions. One of these conditions was that you didn't "crack" a PDF. Removing the password from a PDF broke your "contract" with Adobe to use the PDF specification and you risked being sued.
This changed when Adobe donated the PDF specification to the community in order to make it an ISO standard. Now every one can use this international standard, and the above (risk of being sued by Adobe for infringing the copyright) no longer exists.
As the ISO standard documents the mechanism of encryption with an owner password and it is very easy to use the ISO standard to decrypt a document without having that password, the concept of introducing an owner password to enforce permissions is flawed from a technical point of view. It's merely a psychological way to prevent people to do something with your document that you, as an author, do not want.
It's like a stop sign on a deserted road. It says: you should stop here, but nobody/nothing is going to stop you if no one is around.
Suggested approach:
My approach is to decrypt the PDF using the unethicalreading
parameter, and to look at the permissions that are set. If the permissions don't allow assembly, I refuse the document. I also set permissions on the resulting PDF where I try to find the combination of permissions that respect the permissions set on the original documents.
In some cases, it's not that hard: the people don't know the PDFs are often the owners of the documents who forgot the passwords that were used to encrypt them. In that case, simple permission of the owners of the documents is sufficient to decrypt them.
Final remark: I'm the original developer of iText and I'm responsible for introducing the unethicalreading
parameter. I've chosen the name unethicalreading
only to make sure people are aware of what they are doing. It doesn't mean that using that parameter is always unethical or illegal.