I would like to implement this in C#
I have looked here: http://www.codeproject.com/KB/cpp/PEChecksum.aspx
And am aware of the ImageHlp.dll MapFileAndCheckSum function.
However, for various reasons, I would like to implement this myself.
The best I have found is here: http://forum.sysinternals.com/optional-header-checksum-calculation_topic24214.html
But, I don't understand the explanation. Can anyone clarify how the checksum is calculated?
Thanks!
Update
I from the code example, I do not understand what this means, and how to translate it into C#
sum -= sum < low 16 bits of CheckSum in file // 16-bit borrow
sum -= low 16 bits of CheckSum in file
sum -= sum < high 16 bits of CheckSum in file
sum -= high 16 bits of CheckSum in file
Update #2
Thanks, came across some Python code that does similar too here
def generate_checksum(self):
# This will make sure that the data representing the PE image
# is updated with any changes that might have been made by
# assigning values to header fields as those are not automatically
# updated upon assignment.
#
self.__data__ = self.write()
# Get the offset to the CheckSum field in the OptionalHeader
#
checksum_offset = self.OPTIONAL_HEADER.__file_offset__ + 0x40 # 64
checksum = 0
# Verify the data is dword-aligned. Add padding if needed
#
remainder = len(self.__data__) % 4
data = self.__data__ + ( '\0' * ((4-remainder) * ( remainder != 0 )) )
for i in range( len( data ) / 4 ):
# Skip the checksum field
#
if i == checksum_offset / 4:
continue
dword = struct.unpack('I', data[ i*4 : i*4+4 ])[0]
checksum = (checksum & 0xffffffff) + dword + (checksum>>32)
if checksum > 2**32:
checksum = (checksum & 0xffffffff) + (checksum >> 32)
checksum = (checksum & 0xffff) + (checksum >> 16)
checksum = (checksum) + (checksum >> 16)
checksum = checksum & 0xffff
# The length is the one of the original data, not the padded one
#
return checksum + len(self.__data__)
However, it's still not working for me - here is my conversion of this code:
using System;
using System.IO;
namespace CheckSumTest
{
class Program
{
static void Main(string[] args)
{
var data = File.ReadAllBytes(@"c:\Windows\notepad.exe");
var PEStart = BitConverter.ToInt32(data, 0x3c);
var PECoffStart = PEStart + 4;
var PEOptionalStart = PECoffStart + 20;
var PECheckSum = PEOptionalStart + 64;
var checkSumInFile = BitConverter.ToInt32(data, PECheckSum);
Console.WriteLine(string.Format("{0:x}", checkSumInFile));
long checksum = 0;
var remainder = data.Length % 4;
if (remainder > 0)
{
Array.Resize(ref data, data.Length + (4 - remainder));
}
var top = Math.Pow(2, 32);
for (int i = 0; i < data.Length / 4; i++)
{
if (i == PECheckSum / 4)
{
continue;
}
var dword = BitConverter.ToInt32(data, i * 4);
checksum = (checksum & 0xffffffff) + dword + (checksum >> 32);
if (checksum > top)
{
checksum = (checksum & 0xffffffff) + (checksum >> 32);
}
}
checksum = (checksum & 0xffff) + (checksum >> 16);
checksum = (checksum) + (checksum >> 16);
checksum = checksum & 0xffff;
checksum += (uint)data.Length;
Console.WriteLine(string.Format("{0:x}", checksum));
Console.ReadKey();
}
}
}
Can anyone tell me where I'm being stupid?
No one really answered the original question of "Can anyone define the Windows PE Checksum Algorithm?" so I'm going to define it as simply as possible. A lot of the examples given so far are optimizing for unsigned 32-bit integers (aka DWORDs), but if you just want to understand the algorithm itself at its most fundamental, it is simply this:
Using an unsigned 16-bit integer (aka a WORD) to store the checksum, add up all of the WORDs of the data except for the 4 bytes of the PE optional header checksum. If the file is not WORD-aligned, then the last byte is a 0x00.
Convert the checksum from a WORD to a DWORD and add the size of the file.
The PE checksum algorithm above is effectively the same as the original MS-DOS checksum algorithm. The only differences are the location to skip and replacing the XOR 0xFFFF at the end and adding the size of the file instead.
From my WinPEFile class for PHP, the above algorithm looks like:
The Java example is not entirely correct. Following Java implementation corresponds with the result of Microsoft's original implementation from
Imagehlp.MapFileAndCheckSumA
.It's important that the input bytes are getting masked with
inputByte & 0xff
and the resultinglong
masked again when it's used in the addition term withcurrentWord & 0xffffffffL
(consider the L):In this case, Java is a bit inconvenient.
Java code below from emmanuel may not work. In my case it hangs and does not complete. I believe this is due to the heavy use of IO in the code: in particular the data.read()'s. This can be swapped with an array as solution. Where the RandomAccessFile fully or incrementally reads the file into a byte array(s).
I attempted this but the calculation was too slow due to the conditional for the checksum offset to skip the checksum header bytes. I would imagine that the OP's C# solution would have a similar problem.
The below code removes this also.
public static long computeChecksum(RandomAccessFile data, int checksumOffset) throws IOException {
I still however think that code was too verbose and clunky so I swapped out the raf with a channel and rewrote the culprit bytes to zero's to eliminate the conditional. This code could still probably do with a cache style buffered read.
I was trying to solve the same issue in Java. Here is Mark's solution translated into Java, using a RandomAccessFile instead of a byte array as input:
If you need short unsafe... (Not need use Double and Long integers and not need Array aligning inside algorithm)
The code in the forum post is not strictly the same as what was noted during the actual disassembly of the Windows PE code. The CodeProject article you reference gives the "fold 32-bit value into 16 bits" as:
Which you could translate into C# as: