I would like to know how I could get the original content from a signed pdf document using iText java library or another one.
Thanks
UPDATE 1:
Possible example:
PdfReader reader = new PdfReader(PATH_TO_PDF);
AcroFields fields = reader.getAcroFields();
ArrayList<String> signatures = fields.getSignatureNames();
for (String signature : signatures)
{
// Start revision extraction
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte bb[] = new byte[8192];
InputStream ip = fields.extractRevision(signature);
int n = 0;
while ((n = ip.read(bb)) > 0)
out.write(bb, 0, n);
out.close();
ip.close();
MessageDigest md = MessageDigest.getInstance("SHA1");
byte[] resum = md.digest(out.toByteArray());
// End revision extraction
}
Note 1: In this example all signs are achieved when multiple signs.
Note 2: But the hash is not equal to the original hash document (the unsigned document)
Please take a look at the following image:
In this case, you have a PDF file (starting with %PDF-1.
and ending with %%EOF
) and the digital signature is part of the document itself. It is the value of the /Contents
key in the signature dictionary, that is in turn the value of the /V
entry in the signature field dictionary.
It is not possible to get the original PDF as it once was, because the original PDF was altered: objects were renumbered, a signature field was either added or
"filled out" by adding a signature dictionary.
You can remove the signature, but that won't give you the original PDF file.
PdfReader reader = new PdfReader(SIGNED);
AcroFields acroFields = reader.getAcroFields();
acroFields.removeField("sig");
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(UNSIGNED));
stamper.close();
reader.close();
In this case, SIGNED
is the path to a file with a signature named "sig"
. We remove the complete signature (including the signature field). The path to the resulting file is UNSIGNED
and that's a file in which there is no longer trace of the signature field "sig"
. This is no longer the original PDF that was signed.
Now look at the following image:
This shows a PDF with three signatures. The first signature was added the way I previously described: you can no longer get the original document.
However, the second and third signature were added in append mode. This is the only way to add extra signatures because altering revision 1 would break the first signature.
If you have revision 3 (marked Rev3), it is very easy to retrieve revision 1 and 3 (Rev1 and Rev2). This is shown in the Signatures example:
PdfReader reader = new PdfReader(SIGNED);
AcroFields af = reader.getAcroFields();
FileOutputStream os = new FileOutputStream(REVISION);
byte bb[] = new byte[1028];
InputStream ip = af.extractRevision("first");
int n = 0;
while ((n = ip.read(bb)) > 0)
os.write(bb, 0, n);
os.close();
ip.close();
In this example "first"
is the name of the signature field, SIGNED
is the path to the file with the signature and REVISION
is the path to the revision that results from this operation.