How to compute PDF signature hash?

2019-08-01 01:46发布

This question is related to this one, but a bit more specific. I am suspecting I am not computing the hash of my pdf properly.

I would like to compute the SHA256 hash of a signed PDF.

According to PDF32000 I should:

  1. Get the \ByteRange values
  2. Concatenate the two chunks
  3. Compute the SHA256

Here is what I did:

$ grep -aPo 'ByteRange\[\s*(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s*\]' dummy-signed.pdf
ByteRange[ 0 59718 72772 5058]

$ dd if=dummy-signed.pdf of=head.bin bs=1 skip=0 count=59718
59718 bytes (60 kB, 58 KiB) copied, 0.630196 s, 94.8 kB/s

$ dd if=dummy-signed.pdf of=tail.bin bs=1 skip=72772 count=5058
5058 bytes (5.1 kB, 4.9 KiB) copied, 0.064317 s, 78.6 kB/s

$ cat head.bin tail.bin > whole.bin

$ sha256sum whole.bin
04b69f55f12fa5cc7923f4307154f2702efde43b32e4a8d9dbb0507a56fcecd3  whole.bin

I checked that I am not including the < and > chars:

$ hexdump -C head.bin | tail -n3
0000e930  20 20 20 20 20 20 20 20  20 20 20 20 20 2f 43 6f  |             /Co|
0000e940  6e 74 65 6e 74 73                                 |ntents|
0000e946

$ hexdump -C tail.bin | head -n3
00000000  2f 46 69 6c 74 65 72 2f  41 64 6f 62 65 2e 50 50  |/Filter/Adobe.PP|
00000010  4b 4c 69 74 65 2f 4d 28  44 3a 32 30 31 39 30 31  |KLite/M(D:201901|
00000020  32 38 31 33 34 30 35 38  2b 30 31 27 30 30 27 29  |28134058+01'00')|

Unfortunately it seems my signature is wrong, but after decoding the PKCS7 signature I double checked the hash is sha256WithRSAEncryption, so after verifying this digest I get another hash than the one I computed.

My /SubFilter is:

$ grep -aPo '/SubFilter.*?(?=>)' dummy-signed.pdf
/SubFilter/adbe.pkcs7.detached/Type/Sig

And my PDF version is:

$ grep -aPo '%PDF-\d.\d' dummy-signed.pdf
%PDF-1.6

So from PDF32000 with adbe.pkcs7.detached and PDF 1.6 the HASH should be SHA256 which is compatible with what I found in the PKCS7.

Just for the record, the hash I get from the signature is:

#!/bin/bash
PKCS7='out.pkcs7'

# Extract Digest (SHA256)
OFFSET=$(openssl asn1parse -inform der -in $PKCS7 | \
    perl -ne 'print $1 + $2 if /(\d+):d=\d\s+hl=(\d).*?256 prim.*HEX DUMP/m')
dd if=$PKCS7 of=signed-sha256.bin bs=1 skip=$OFFSET count=256

# Extract Public key 
openssl pkcs7 -print_certs -inform der -in $PKCS7 | \
    tac | sed '/-----BEGIN/q' | tac > client.pem
openssl x509 -in client.pem -pubkey -noout > client.pub.pem

# Verify the signature
openssl rsautl -verify -pubin -inkey client.pub.pem < signed-sha256.bin > verified.bin

# Get Hash and compare with the computed hash from the PDF
openssl asn1parse -inform der -in verified.bin | grep -Po '\[HEX DUMP\]:\K\w+$' | tr A-F a-f

$ ./verify-signature.sh
256+0 records in
256+0 records out
256 bytes copied, 0.029548 s, 8.7 kB/s
2a3f629f7bdce750321da7f219ec5759dc9ed14818acbd3cd0b6092d5371c03a

You can access the test PDF file dummy-signed.pdf from my gist

curl https://gist.githubusercontent.com/nowox/94dd54e484df877e1232c18bd7b91c97/raw/d249f3757137e9b665e895c900f08b1156f1bc4f/dummy-signed.pdf.base64 | base64 --decode > dummy-signed.pdf

1条回答
干净又极端
2楼-- · 2019-08-01 02:30

In short

You try to extract the wrong hash value from the signature container.

In detail

I didn't recognize this earlier because I'm not really an openssl expert. Analyzing the example PDF, though, the cause of the confusion became clear.

In a PKCS#7 / CMS signature container there usually are (at least) two hash values of interest:

  • the hash value of the signed document data in the messageDigest signed attribute and
  • the hash value of the signed attributes (in case of the old RSA signing scheme) in the encrypted signature bytes.

The messageDigest signed attribute in the signature container in your example document looks like this (appearances might differ if you asn1-dump in openssl but the value should be recognizable nonetheless):

5306   47: . . . . . . SEQUENCE {
    <06 09>
5308    9: . . . . . . . OBJECT IDENTIFIER messageDigest (1 2 840 113549 1 9 4)
         : . . . . . . . . (PKCS #9)
    <31 22>
5319   34: . . . . . . . SET {
    <04 20>
5321   32: . . . . . . . . OCTET STRING    
         : . . . . . . . . . 04 B6 9F 55 F1 2F A5 CC    ...U./..
         : . . . . . . . . . 79 23 F4 30 71 54 F2 70    y#.0qT.p
         : . . . . . . . . . 2E FD E4 3B 32 E4 A8 D9    ...;2...
         : . . . . . . . . . DB B0 50 7A 56 FC EC D3                            
         : . . . . . . . . }
         : . . . . . . . }
         : . . . . . . }

As you can recognize, this attribute contains the hash value you calculated.

You on the other hand try to extract the signed hash value from the decrypted signature bytes which is not the hash of the document but instead the hash of the signed attributes!

Additionally something appears to go wrong in that extraction step, the value you should retrieve is

AB86B27177E388A1EE69A5C7479D74621E84473E0CAB5C647471B724FEFCE826

and not the

2a3f629f7bdce750321da7f219ec5759dc9ed14818acbd3cd0b6092d5371c03a

you got.

查看更多
登录 后发表回答