Faster way to convert byte array to int

2019-03-22 06:28发布

站内文章 / 移动开发

55 0

啃猪蹄的小仙女

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Is there a faster way than BitConverter.ToInt32 to convert a byte array to an int value?

回答1:

If I remember correctly, that implementation uses unsafe code (treating a byte* as an int*), so it will be hard to beat, but the other alternative is shifting.

However, from lots of work in this area, this is so unlikely to be a genuine bottleneck as to be irrelevant. I/O is the main issue, typically.

GetBytes(int), however, is more expensive (in high volume) due to array / heap allocation.

回答2:

I actually tried several different ways to convert four bytes to an int:

BitConverter.ToInt32(new byte[] { w, x, y, z }, 0);
BitConverter.ToUInt32(new byte[] { w, x, y, z }, 0);
b = new byte[] { w, x, y, z }; BitConverter.ToInt32(b, 0);
b = new byte[] { 1, 2, 3, 4, 5, 6, 7, w, x, y, z }; BitConverter.ToInt32(b, 7);
w | (x << 8) | (y << 16) | (z << 24);
b[0] | (b[1] << 8) | (b[2] << 16) | (b[3] << 24);

I ran 10^9 iterations of each one in a Release (x86) build not on under a debugger on a 2.5 GHz Core i7 laptop. Here are my results (note that the methods that don't use BitConverter are substantially faster):

test1: 00:00:15.5287282 67305985
test2: 00:00:15.1334457 67305985
test3: 00:00:08.0648586 67305985
test4: 00:00:11.2307059 67305985
test5: 00:00:02.0219417 67305985
test6: 00:00:01.6275684 67305985

Some conclusions you can draw:

test1 shows that on my laptop it's hard to make the conversion go slower than 15ns, which I hate to say should be fast enough for anyone. (Do you need to call it more than 60M times per second?)
test2 shows that using uint instead of int saves a small amount of time. I'm not sure why, but I think it's small enough to be experimental error.
test3 shows that the overhead of creating a new byte array (7ns) is as nearly as much as calling the function, but is still faster than making a new array out of the old array.
test4 shows that making unaligned array accesses from ToInt32 adds overhead (3ns)
test5 shows that pulling the 4 bytes from local variables and combining them yourself is several times faster than calling ToInt32.
test6 shows that it's actually slightly faster to pull the 4 bytes from an array than from function arguments! I suspect this is due to CPU pipelining or cache effects.

The fastest, test6, took only twice as long to run as an empty loop (not shown). In other words, it took less than 1ns to perform each conversion. Good luck getting any useful calculation to go faster than that!

Here's my test program:

using System;

namespace BitConverterTest
{
    class Program
    {
        const int iters = 1000000000;
        static void Main(string[] args)
        {
            test1(1, 2, 3, 4);
            test2(1, 2, 3, 4);
            test3(1, 2, 3, 4);
            test4(1, 2, 3, 4);
            test5(1, 2, 3, 4);
            test6(1, 2, 3, 4);
        }

        static void test1(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToInt32(new byte[] { w, x, y, z }, 0);
            Console.WriteLine("test1: " + timer.Elapsed + " " + res);
        }

        static void test2(byte w, byte x, byte y, byte z)
        {
            uint res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToUInt32(new byte[] { w, x, y, z }, 0);
            Console.WriteLine("test2: " + timer.Elapsed + " " + res);
        }

        static void test3(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToInt32(b, 0);
            Console.WriteLine("test3: " + timer.Elapsed + " " + res);
        }

        static void test4(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { 1, 2, 3, 4, 5, 6, 7, w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToInt32(b, 7);
            Console.WriteLine("test4: " + timer.Elapsed + " " + res);
        }

        static void test5(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = w | (x << 8) | (y << 16) | (z << 24);
            Console.WriteLine("test5: " + timer.Elapsed + " " + res);
        }

        static void test6(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = b[0] | (b[1] << 8) | (b[2] << 16) | (b[3] << 24);
            Console.WriteLine("test6: " + timer.Elapsed + " " + res);
        }
    }
}

回答3:

Followup to Gabe's performance tests:

Changes:

Eliminate tests 1 and 2, because the inline array creation made these tests of the GC (as can be seen from the Gen 0 GC performance counter).
Eliminate test 4 (non-aligned array) to keep things simpler.
Add tests 7 and 8 which do conversions from a large array (256 MB) via BitConverter and bit fiddling respectively.
Add running total to tests to try and avoid common sub-expression elimination, which clearly lead to the low times in Gabe's tests 5 and 6.

Results:

32-bit option:

test3: 00:00:06.9230577
test5: 00:00:03.8349386
test6: 00:00:03.8238272
test7: 00:00:07.3898489
test8: 00:00:04.6807391

64-bit option:

test3: 00:00:05.8794322
test5: 00:00:00.4384600
test6: 00:00:00.4069573
test7: 00:00:06.2279365
test8: 00:00:03.5472486

Analysis

Still getting common sub-expression elimination in 5 and 6 on 64-bit.
For this 64 bit is a win. But such a micro-benchmark shouldn't be followed for choosing where to optimise an application.
It looks like about a 50% improvement when converting 256 MB of random data into ints. As the test does it 16 times, that's less that 0.2s—unlikely to make a real difference outside a very narrow subset of applications, and then you have the additional maintenance cost of ensuring that someone doesn't break the code over the application lifetime.
I wonder how much of the BitConverter overhead is the parameter checks it does?
Test 6 is only a little faster than 5. Clearly array bounds checks are being eliminated.

The Code

using System;

namespace BitConverterTest {
    class Program {
        const int iters = 1024*1024*1024;
        const int arrayLen = iters/4;
        static byte[] array = new byte[arrayLen];

        static void Main(string[] args) {
            //test1(1, 2, 3, 4);
            //test2(1, 2, 3, 4);
            test3(1, 2, 3, 4);
            //test4(1, 2, 3, 4);
            test5(1, 2, 3, 4);
            test6(1, 2, 3, 4);

            // Fill array with good PRNG data
            var rng = new System.Security.Cryptography.RNGCryptoServiceProvider();
            rng.GetBytes(array);

            test7();
            test8();
        }

        // BitConverter with aligned input
        static void test3(byte w, byte x, byte y, byte z) {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToInt32(b, 0);
            Console.WriteLine("test3: " + timer.Elapsed + " " + res);
        }

        // Inline bitfiddling with separate variables.
        static void test5(byte w, byte x, byte y, byte z) {
            long res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++) {
                int a = w | (x << 8) | (y << 16) | (z << 24);
                res += a;
            }
            Console.WriteLine("test5: " + timer.Elapsed + " " + res);
        }

        // Inline bitfiddling with array elements.
        static void test6(byte w, byte x, byte y, byte z) {
            long res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++) {
                int a = b[0] | (b[1] << 8) | (b[2] << 16) | (b[3] << 24);
                res += a;
            }
            Console.WriteLine("test6: " + timer.Elapsed + " " + res);
        }

        // BitConvert from large array...
        static void test7() {
            var its = iters/arrayLen * 4; // *4 to remove arrayLen/4 factor.
            var timer = System.Diagnostics.Stopwatch.StartNew();
            long res = 0;
            for (var outer = 0; outer < its; outer++) {
                for (var pos = 0; pos < arrayLen; pos += 4) {
                    var x = BitConverter.ToInt32(array, pos);
                    res += x;
                }
            }
            Console.WriteLine("test7: " + timer.Elapsed + " " + res);
        }

        // Bitfiddle from large array...
        static void test8() {
            var its = iters/arrayLen * 4;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            long res = 0;
            for (var outer = 0; outer < its; outer++) {
                for (var pos = 0; pos < arrayLen; pos += 4) {
                    int x = array[pos] | (array[pos+1] << 8) | (array[pos+2] << 16) | (array[pos+3] << 24);
                    res += x;
                }
            }
            Console.WriteLine("test8: " + timer.Elapsed + " " + res);
        }
    }
}

回答4:

Based on a quick review of the implementation of BitConverter.ToInt32 in .NET Reflector I would say "No".

It optimises for the case where the array is aligned and directly casts the bytes, otherwise it performs a bitwise merge.

回答5:

I have also fiddled with similar issues.

In my case it was how to convert to single precision floats when data is stored as double precision byte[]s, or just between the double representation and the byte[] representation etc. The best is not to go through too many API layers if one wants to achieve the best performance on large sets of data, and to embed as much info as you can into the algo as possible without making it too brittle or incomprehensible.

So, to further follow up from Richard's tests, I add another test below (test9) which is the way I've gone in my own work and answers his point 4 in his Analysis section:

Use unsafe memory pointer accessing to achieve the most performant result. Something that comes naturally if you use c++, but not necessarily c#. This is similar to what BitConverter is doing under the hood, but without the parameter and safety checks (as, of course, we know what we are doing... ;)

Results:

32-bit option:

test3: 00:00:06.2373138
test5: 00:00:03.1193338
test6: 00:00:03.1609287
test7: 00:00:07.7328020
test8: 00:00:06.4192130
test9: 00:00:03.9590307

64-bit option:

test3: 00:00:06.2209098
test5: 00:00:00.5563930
test6: 00:00:01.5486780
test7: 00:00:08.4858474
test8: 00:00:05.4991740
test9: 00:00:02.2928944

Here the same code, including the new test9:

using System;

namespace BitConverterTest
{
    class Program
    {
        const int iters = 1024 * 1024 * 1024;
        const int arrayLen = iters / 4;
        static byte[] array = new byte[arrayLen];

        static void Main(string[] args)
        {
            //test1(1, 2, 3, 4);
            //test2(1, 2, 3, 4);
            test3(1, 2, 3, 4);
            //test4(1, 2, 3, 4);
            test5(1, 2, 3, 4);
            test6(1, 2, 3, 4);

            // Fill array with good PRNG data
            var rng = new System.Security.Cryptography.RNGCryptoServiceProvider();
            rng.GetBytes(array);

            test7();
            test8();
            test9();
        }

        // BitConverter with aligned input
        static void test3(byte w, byte x, byte y, byte z)
        {
            int res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
                res = BitConverter.ToInt32(b, 0);
            Console.WriteLine("test3: " + timer.Elapsed + " " + res);
        }

        // Inline bitfiddling with separate variables.
        static void test5(byte w, byte x, byte y, byte z)
        {
            long res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
            {
                int a = w | (x << 8) | (y << 16) | (z << 24);
                res += a;
            }
            Console.WriteLine("test5: " + timer.Elapsed + " " + res);
        }

        // Inline bitfiddling with array elements.
        static void test6(byte w, byte x, byte y, byte z)
        {
            long res = 0;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            var b = new byte[] { w, x, y, z };
            for (int i = 0; i < iters; i++)
            {
                int a = b[0] | (b[1] << 8) | (b[2] << 16) | (b[3] << 24);
                res += a;
            }
            Console.WriteLine("test6: " + timer.Elapsed + " " + res);
        }

        // BitConvert from large array...
        static void test7()
        {
            var its = iters / arrayLen * 4; // *4 to remove arrayLen/4 factor.
            var timer = System.Diagnostics.Stopwatch.StartNew();
            long res = 0;
            for (var outer = 0; outer < its; outer++)
            {
                for (var pos = 0; pos < arrayLen; pos += 4)
                {
                    var x = BitConverter.ToInt32(array, pos);
                    res += x;
                }
            }
            Console.WriteLine("test7: " + timer.Elapsed + " " + res);
        }

        // Bitfiddle from large array...
        static void test8()
        {
            var its = iters / arrayLen * 4;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            long res = 0;
            for (var outer = 0; outer < its; outer++)
            {
                for (var pos = 0; pos < arrayLen; pos += 4)
                {
                    int x = array[pos] | (array[pos + 1] << 8) | (array[pos + 2] << 16) | (array[pos + 3] << 24);
                    res += x;
                }
            }
            Console.WriteLine("test8: " + timer.Elapsed + " " + res);
        }

        // unsafe memory operations from large array...
        // (essentialy internals of BitConverter without param checks, etc)
        static unsafe void test9()
        {
            var its = iters / arrayLen * 4;
            var timer = System.Diagnostics.Stopwatch.StartNew();
            long res = 0;
            int value = 0;
            for (var outer = 0; outer < its; outer++)
            {
                for (var pos = 0; pos < arrayLen; pos += 4)
                {
                    fixed (byte* numPtr = &array[pos])
                    {
                        value = *(int*)numPtr;
                    }
                    int x = *(int*)&value;
                    res += x;
                }
            }
            Console.WriteLine("test9: " + timer.Elapsed + " " + res);
        }

    }
}