There's got to be a faster and better way to swap bytes of 16bit words then this.:
public static void Swap(byte[] data)
{
for (int i = 0; i < data.Length; i += 2)
{
byte b = data[i];
data[i] = data[i + 1];
data[i + 1] = b;
}
}
Does anyone have an idea?
Next method, in my test, almost 3 times faster as the accepted answer. (Always faster on more than 3 characters or six bytes, a bit slower on less or equal to three characters or six bytes.) (Note that the accepted answer can read/write outside the bounds of the array.)
(Update While having a pointer there's no need to call the property to get the length. Using that pointer is a bit faster, but requires either a runtime check or, as in next example, a project configuration to build for each platform. Define X86 and X64 under each configuration.)
Five tests with 300.000 times 8192 bytes
Five tests with 50.000.000 times 6 bytes
But if the data is large and performance really matters, you could use SSE or AVX. (13 times faster.) https://pastebin.com/WaFk275U
Test 5 times, 100000 loops with 8192 bytes or 4096 chars
Well, you could use the XOR swapping trick, to avoid an intermediate byte. It won't be any faster, though, and I wouldn't be surprised if the IL is exactly the same.
In my attempt to apply for the Uberhacker award, I submit the following. For my testing, I used a Source array of 8,192 bytes and called
SwapX2
100,000 times:My benchmarking indicates that this version is over 1.8 times faster than the code submitted in the original question.
I always liked this:
I believe you'll find this is the fastest method as well a being fairly readable and safe. Obviously this applies to 64-bit values but the same technique could be used for 32- or 16-.
This way appears to be slightly faster than the method in the original question:
My benchmarking assumed that the method is called repeatedly, so that the resizing of the
_temp
array isn't a factor. This method relies on the fact that half of the byte-swapping can be done with the initialBuffer.BlockCopy(...)
call (with the source position offset by 1).Please benchmark this yourselves, in case I've completely lost my mind. In my tests, this method takes approximately 70% as long as the original method (which I modified to declare the
byte b
outside of the loop).