I need to reverse engineer CRC/Checksum algorithm implemented by windows CE executable. Being propritory protocol, it does not say anything about CRC/checksum algorithm. However, There is console interface that reports correct/calculated checksum and I can construct my own messages with random bits if message protocol is correct:
I have observed that,
Changing single bit in message changes checksum bytes completely.
Algorithm seems to be position dependent as I fed some single 1 bit messages in various message data positions with rest of the bits zero and all the time console reported different checksum. If it was simple additive checksum, checksum would have been identical.
I applied common XOR, LRC, Additive checksum algorithms, common CRC polynomials(Standerd, CCITT, X-modem) and gone through [CRC Reverse engineering essay][2] but unfortunately I cannot go past deducing the polynomial because message type is fixed so cannot create single 1 bit message.
My questions:
Are there any CRC/checksum algorithm properties that I can test against messages to determine if algorithm is checksum or polynomial based CRC?
Is there any way to relate error message seen in program disassembly with corrosponding assembly instructions?
What are the ways to debug/pinpoint disassembly code the moment it reports correct checksum on console? Memory dump or something?
Try CRC RevEng. Some quick attempts with your data were fruitless, but I didn't try very hard. Consider trying not just all ten message bytes, but also the last eight and the last six.
In addition you can find at that same site the most comprehensive list of known CRCs that I'm aware of.
Update:
It is highly likely that this is a CRC of some sort, or at least a linear operation over GF(2). It has this property that CRC's have: if two sequences have the same exclusive-or, then their CRC's also have the same exclusive-ors. For example, from your data (dropping the common prefix, though note that including the prefix or a portion of it does not change the result):
and
Given this fact, there is a way for you to construct a routine to calculate the check value without determining if it's a CRC or what the CRC parameters are.
Generate the 16-bit check value for all of the single bit messages, i.e. a single bit set in the six bytes of message data, with the rest of the message data bits zero. These messages are a complete set of basis vectors for this linear field. There are 48 of them. Also generate the check values for an all zero message. You already have a start at that with all zeros giving
2020
, the last bit set giving22b5
, etc. Exclusive-or the check value for all zeros (2020
) with each of the others. You now have 49 values of which 48 are for the basis vectors and one is the correction for the zero vector (which is non-zero likely due to pre and post conditioning of a CRC and the prefix bytes). For example, the value for the basis vector with the last bit set is0295
.Now you can use those 49 values to calculate the check value for any six-byte message. Exclusive-or together the values for all of the corresponding bits that are set to one in that message. Exclusive-or that with the check value for zero. The result will be the check value for that message.