get original content of a pdf signed with itextsha

2019-02-20 01:39发布

问题:

I'm trying to get the original document of a signed PDF in order to compare it's hash with an stored doc.

This is really easy when the document has several signatures, with acrobat reader you can go the previous revision of the document save it and that's it.

Surprisingly this does not work with the first signature, where there is no straight forward way to get the original data.

As it is not possible to do it with the reader I have tried programatically with iTextSharp. However although I have googled deeply I have not found how to do it. The only relevant post I found is this one but no solution is offered.

Has anyone faced this problem and found a solution?

Thanks in advance.

EDIT: I put here the code that extracts the data based on the response of mkl. Read the comments of the response to beware of the problem with the unfixed length of the non signed PDFs.

String sOriginalText = File.ReadAllText("FileSigned.pdf", Encoding.Default);
int sTrailerNumberPosition = sOriginalText.LastIndexOf("]/Prev ") + "]/Prev ".Length;
int sTrailerNumberEndPosition = sOriginalText.IndexOf(">", sTrailerNumberPosition);
String sTrailerIndex = sOriginalText.Substring(sTrailerNumberPosition, sTrailerNumberEndPosition -sTrailerNumberPosition);
int iTrailerIndexPosition = sOriginalText.IndexOf(sTrailerIndex + "\r\n%%EOF");
int iEndPosition = sOriginalText.IndexOf("%%EOF", iTrailerIndexPosition) + "%%EOF".Length;
String sOutText = sOriginalText.Substring(0, iEndPosition);
File.WriteAllText("c:/OriginalFile.pdf", sOutText, Encoding.Default);

回答1:

Whether or not your task to get the original document of a signed PDF is realizable at all, depends on how the signature originally was applied.

  1. If the signature was applied in append mode (i.e. according to the language of the PDF specification ISO 32000-1:2008 as an incremental update, cf. section 7.5.6), you merely have to cut off this appended, incremental update revision.

    As you have a stored document which presumably after signing has become the document you inspect, you can simply cut the signed file at the length of the stored one and the compare, e.g. using hashes. This suffices to show that the signed document is derived from your original one. There may have been other, intermediary revisions, though, as you might just have cut off multiple incremental updates.

    In general you can find the prior revision by following the /Prev trailer entry of your signed PDF to the cross reference table of the prior revision and from there move onwards to the document end marker %%EOF because in an incremental update

    the added trailer shall contain all the entries except the Prev entry (if present) from the previous trailer, whether modified or not. In addition, the added trailer dictionary shall contain a Prev entry giving the location of the previous cross-reference section (see Table 15). Each trailer shall be terminated by its own end-of-file (%%EOF) marker.

    In case of PDFs using cross reference streams instead of cross reference tables, there is the analogous entry in the cross-reference stream dictionary:

    (Present only if the file has more than one cross-reference stream; not meaningful in hybrid-reference files; see 7.5.8.4, "Compatibility with Applications That Do Not Support Compressed Reference Streams") The byte offset in the decoded stream from the beginning of the file to the beginning of the previous cross-reference stream. This entry has the same function as the Prev entry in the trailer dictionary (Table 15).

    You should be aware, though, that the appended, incremental update revision can contain other changes in addition to the signature. Thus, even if the previous revision corresponds with your stored document, you still only know that the signed document is based on your saved one.

  2. If the signature was not applied in append mode, you are out of luck: Programs manipulating PDFs (e.g. for signing) might completely rearrange the binary contents of your document, possibly even renumbering objects, changing compression, removing unused objects, etc., while the appearance of the document remains the same.