I'm trying to get the original document of a signed PDF in order to compare it's hash with an stored doc.
This is really easy when the document has several signatures, with acrobat reader you can go the previous revision of the document save it and that's it.
Surprisingly this does not work with the first signature, where there is no straight forward way to get the original data.
As it is not possible to do it with the reader I have tried programatically with iTextSharp. However although I have googled deeply I have not found how to do it. The only relevant post I found is this one but no solution is offered.
Has anyone faced this problem and found a solution?
Thanks in advance.
EDIT: I put here the code that extracts the data based on the response of mkl. Read the comments of the response to beware of the problem with the unfixed length of the non signed PDFs.
String sOriginalText = File.ReadAllText("FileSigned.pdf", Encoding.Default);
int sTrailerNumberPosition = sOriginalText.LastIndexOf("]/Prev ") + "]/Prev ".Length;
int sTrailerNumberEndPosition = sOriginalText.IndexOf(">", sTrailerNumberPosition);
String sTrailerIndex = sOriginalText.Substring(sTrailerNumberPosition, sTrailerNumberEndPosition -sTrailerNumberPosition);
int iTrailerIndexPosition = sOriginalText.IndexOf(sTrailerIndex + "\r\n%%EOF");
int iEndPosition = sOriginalText.IndexOf("%%EOF", iTrailerIndexPosition) + "%%EOF".Length;
String sOutText = sOriginalText.Substring(0, iEndPosition);
File.WriteAllText("c:/OriginalFile.pdf", sOutText, Encoding.Default);
Whether or not your task to get the original document of a signed PDF is realizable at all, depends on how the signature originally was applied.
If the signature was applied in append mode (i.e. according to the language of the PDF specification ISO 32000-1:2008 as an incremental update, cf. section 7.5.6), you merely have to cut off this appended, incremental update revision.
As you have a stored document which presumably after signing has become the document you inspect, you can simply cut the signed file at the length of the stored one and the compare, e.g. using hashes. This suffices to show that the signed document is derived from your original one. There may have been other, intermediary revisions, though, as you might just have cut off multiple incremental updates.
In general you can find the prior revision by following the /Prev trailer entry of your signed PDF to the cross reference table of the prior revision and from there move onwards to the document end marker %%EOF because in an incremental update
In case of PDFs using cross reference streams instead of cross reference tables, there is the analogous entry in the cross-reference stream dictionary:
You should be aware, though, that the appended, incremental update revision can contain other changes in addition to the signature. Thus, even if the previous revision corresponds with your stored document, you still only know that the signed document is based on your saved one.
If the signature was not applied in append mode, you are out of luck: Programs manipulating PDFs (e.g. for signing) might completely rearrange the binary contents of your document, possibly even renumbering objects, changing compression, removing unused objects, etc., while the appearance of the document remains the same.