Premise
I'm trying to decode the data from the barcode format currently used on tickets issued by Deutsche Bahn (german railway). I have found this very useful website (german) that already does a similar thing and offers a python script.
The website states that the data is compressed with zlib
, the resulting blob is signed with DSA and all of it is stored in the barcode (Aztec format).
Example of such a barcode
Problem
I have used the script provided on the website to successfully decode a ticket. Installed the python-pyasn1 library. Read the barcode (used BCTester as per instructions, had some trouble with NeoReader app) and converted the result to hex. Saved the hex data as plain text file (as is for some reason required by the script) and parsed the file with the script. It worked.
But the script is doing too much. I'd like to do the parsing myself, but I can't get the zlib decompression to work and I understand to little of the code to make sense of it. I know almost no Python. I have some programming experience, though.
If you simply look at the data from the barcode, it looks like this: https://gist.github.com/oelna/096787dc18596aaa4f5f
The first question would be: What is the DSA signature and do I need to split it from the actual compressed data first?
The second: What could a simple python script look like that reads the barcode blob from a file and simply decompresses it, so I can further parse the format. I had something in mind like
#!/usr/bin python
import zlib
ticket = open('ticketdata.txt').read()
print zlib.decompress(ticket)
but it's not working. Any hint in the right direction would be appreciated.
Here is the hex data that is readable by the script if saved to a file:
23 55 54 30 31 30 30 38 30 30 30 30 30 31 30 2c 02 14 1c 3d e9 2d cd 5e c4 c0 56 bd ae 61 3e 54 ad a1 b3 26 33 d2 02 14 40 75 03 d0 cf 9c c1 f5 70 58 bd 59 50 a7 af c5 eb 0a f4 74 00 00 00 00 30 32 37 31 78 9c 65 50 cb 4e c3 30 10 e4 53 2c 71 43 4a d9 f5 2b 36 b7 84 04 52 01 55 51 40 1c 51 01 23 2a 42 0e 21 15 3f c7 8d 1f 63 36 11 52 2b 7c f1 78 76 76 66 bd f7 8f 4d 5d 54 c4 44 ce 10 05 d2 eb 78 5b ac 32 7b b4 77 c8 11 6b 62 c7 d6 79 aa ea aa 16 e1 b2 22 4d c4 01 ad 36 58 61 ca 6b 30 c6 e5 64 a0 b6 97 0f a6 a9 6f d6 71 df c7 cf 3e 7f 37 93 66 8e c6 71 de 92 4c c0 e1 22 0d fd 57 7a cb ee b6 cf ef 69 54 fd 66 44 05 31 d0 03 18 01 05 40 04 70 9c 51 46 ad 38 49 33 00 86 20 dd 42 88 04 22 5f a6 a1 db f6 78 79 d4 79 95 76 1f 3f df fd e7 98 86 16 b1 30 0b 65 d6 3c bd 2a 15 ce d8 ab e5 79 9d 47 7b da 34 13 c7 34 73 5a 6b 0b 35 72 d9 5c 0d bb ae 53 aa e8 5f 86 b4 01 e9 25 8d 0d 50 8e 72 3c 39 3c b2 13 94 82 74 ce 2d c7 b3 41 8b ed 4c 9f f5 0b e2 85 6c 01 8c fe c7 b8 e9 87 8c d9 f1 90 28 a3 73 fe 05 6d de 5f f1
Update/Solution:
Mark Adler's tip set me on the right track. It took me hours, but I hacked together a working solution to this particular problem. If I had been smarter, I would have recognized the zlib header 78 9C
at offset 68. Simply split the data at this point and the second half decompresses without complaint. Be warned, very sad python
dsa_signature = ''
zlib_data = ''
cursor = 0
with open('ticketdata.txt', "rb") as fp:
chunk = fp.read(1)
while chunk:
if(cursor < 68):
dsa_signature += chunk
else:
zlib_data += chunk
chunk = fp.read(1)
cursor = cursor + 1
print "\nSignature:"
print "%s\n" % dsa_signature
print "\nCompressed data:"
print "%s\n" % zlib_data
print "\nDecoded:"
print zlib.decompress(zlib_data)
If there is an easy solution to this, feel free to comment. I'll continue working on this for a little more and try to make it a more robust solution that actively seeks out the zlib header, without hardcoding the offset. The first half is an identifier code, like #UT010080000060,
, followed by a ASN.1
DSA signature, which luckily I don't need to verify or modify.
There is a complete and valid zlib stream starting at offset 68 in your hex data, and going to the end. It decompresses to:
If you drop the first 68 bytes of your example,
zlib.decompress()
will return the above.It's up to you to figure out what the first 68 bytes are.