In this question, the following code:
public static void Swap(byte[] data)
{
for (int i = 0; i < data.Length; i += 2)
{
byte b = data[i];
data[i] = data[i + 1];
data[i + 1] = b;
}
}
was rewritten in unsafe code to improve its performance:
public static unsafe void SwapX2(Byte[] Source)
{
fixed (Byte* pSource = &Source[0])
{
Byte* bp = pSource;
Byte* bp_stop = bp + Source.Length;
while (bp < bp_stop)
{
*(UInt16*)bp = (UInt16)(*bp << 8 | *(bp + 1));
bp += 2;
}
}
}
Assuming that one wanted to do the same thing with 32 bit words:
public static void SwapX4(byte[] data)
{
byte temp;
for (int i = 0; i < data.Length; i += 4)
{
temp = data[i];
data[i] = data[i + 3];
data[i + 3] = temp;
temp = data[i + 1];
data[i + 1] = data[i + 2];
data[i + 2] = temp;
}
}
how would this be rewritten in a similar fashion?
This version will not exceed the bounds of the buffer. Works on both Little and Big Endian architectures. And is faster on larger data. (Update: Add build configurations for x86 and x64, predefine X86 for 32 bit(x86) and X64 for 64 bit(x64) and it'll be slightly faster.)
Note that both of these functions (my SwapX4 and your SwapX2) will only swap anything on a little-endian host; when run on a big-endian host, they are an expensive no-op.