Faster way to swap endianness in C# with 16 bit wo

2019-03-13 07:00发布

问题:

There's got to be a faster and better way to swap bytes of 16bit words then this.:

public static void Swap(byte[] data)
{
    for (int i = 0; i < data.Length; i += 2)
    {
        byte b = data[i];
        data[i] = data[i + 1];
        data[i + 1] = b;
    }
}

Does anyone have an idea?

回答1:

In my attempt to apply for the Uberhacker award, I submit the following. For my testing, I used a Source array of 8,192 bytes and called SwapX2 100,000 times:

public static unsafe void SwapX2(Byte[] source)  
{  
    fixed (Byte* pSource = &source[0])  
    {  
        Byte* bp = pSource;  
        Byte* bp_stop = bp + source.Length;  

        while (bp < bp_stop)  
        {
            *(UInt16*)bp = (UInt16)(*bp << 8 | *(bp + 1));  
            bp += 2;  
        }  
    }  
}

My benchmarking indicates that this version is over 1.8 times faster than the code submitted in the original question.



回答2:

This way appears to be slightly faster than the method in the original question:

private static byte[] _temp = new byte[0];
public static void Swap(byte[] data)
{
    if (data.Length > _temp.Length)
    {
        _temp = new byte[data.Length];
    }
    Buffer.BlockCopy(data, 1, _temp, 0, data.Length - 1);
    for (int i = 0; i < data.Length; i += 2)
    {
        _temp[i + 1] = data[i];
    }
    Buffer.BlockCopy(_temp, 0, data, 0, data.Length);
}

My benchmarking assumed that the method is called repeatedly, so that the resizing of the _temp array isn't a factor. This method relies on the fact that half of the byte-swapping can be done with the initial Buffer.BlockCopy(...) call (with the source position offset by 1).

Please benchmark this yourselves, in case I've completely lost my mind. In my tests, this method takes approximately 70% as long as the original method (which I modified to declare the byte b outside of the loop).



回答3:

I always liked this:

public static Int64 SwapByteOrder(Int64 value) 
{
  var uvalue = (UInt64)value;
  UInt64 swapped = 
       ( (0x00000000000000FF) & (uvalue >> 56)
       | (0x000000000000FF00) & (uvalue >> 40)
       | (0x0000000000FF0000) & (uvalue >> 24)
       | (0x00000000FF000000) & (uvalue >> 8)
       | (0x000000FF00000000) & (uvalue << 8)
       | (0x0000FF0000000000) & (uvalue << 24)
       | (0x00FF000000000000) & (uvalue << 40)
       | (0xFF00000000000000) & (uvalue << 56));
  return (Int64)swapped;
}

I believe you'll find this is the fastest method as well a being fairly readable and safe. Obviously this applies to 64-bit values but the same technique could be used for 32- or 16-.



回答4:

Next method, in my test, almost 3 times faster as the accepted answer. (Always faster on more than 3 characters or six bytes, a bit slower on less or equal to three characters or six bytes.) (Note that the accepted answer can read/write outside the bounds of the array.)

(Update While having a pointer there's no need to call the property to get the length. Using that pointer is a bit faster, but requires either a runtime check or, as in next example, a project configuration to build for each platform. Define X86 and X64 under each configuration.)

static unsafe void SwapV2(byte[] source)
{
    fixed (byte* psource = source)
    {
#if X86
        var length = *((uint*)(psource - 4)) & 0xFFFFFFFEU;
#elif X64
        var length = *((uint*)(psource - 8)) & 0xFFFFFFFEU;
#else
        var length = (source.Length & 0xFFFFFFFE);
#endif
        while (length > 7)
        {
            length -= 8;
            ulong* pulong = (ulong*)(psource + length);
            *pulong = ( ((*pulong >> 8) & 0x00FF00FF00FF00FFUL)
                      | ((*pulong << 8) & 0xFF00FF00FF00FF00UL));
        }
        if(length > 3)
        {
            length -= 4;
            uint* puint = (uint*)(psource + length);
            *puint = ( ((*puint >> 8) & 0x00FF00FFU)
                     | ((*puint << 8) & 0xFF00FF00U));
        }
        if(length > 1)
        {
            ushort* pushort = (ushort*)psource;
            *pushort = (ushort) ( (*pushort >> 8)
                                | (*pushort << 8));
        }
    }
}

Five tests with 300.000 times 8192 bytes

  • SwapV2: 1055, 1051, 1043, 1041, 1044
  • SwapX2: 2802, 2803, 2803, 2805, 2805

Five tests with 50.000.000 times 6 bytes

  • SwapV2: 1092, 1085, 1086, 1087, 1086
  • SwapX2: 1018, 1019, 1015, 1017, 1018

But if the data is large and performance really matters, you could use SSE or AVX. (13 times faster.) https://pastebin.com/WaFk275U

Test 5 times, 100000 loops with 8192 bytes or 4096 chars

  • SwapX2 : 226, 223, 225, 226, 227 Min: 223
  • SwapV2 : 113, 111, 112, 114, 112 Min: 111
  • SwapA2 : 17, 17, 17, 17, 16 Min: 16


回答5:

Well, you could use the XOR swapping trick, to avoid an intermediate byte. It won't be any faster, though, and I wouldn't be surprised if the IL is exactly the same.

for (int i = 0; i < data.Length; i += 2)
{
    data[i] ^= data[i + 1];
    data[i + 1] ^= data[i];
    data[i] ^= data[i + 1];
}


标签: c# endianness