SSE instruction to check if byte array is zeroes C

2019-02-25 10:22发布

Suppose I have a byte[] and want to check if all bytes are zeros. For loop is an obvious way to do it, and LINQ All() is a fancy way to do it, but highest performance is critical.

How can I use Mono.Simd to speed up checking if byte array is full of zeroes? I am looking for cutting edge approach, not merely correct solution.

1条回答
一夜七次
2楼-- · 2019-02-25 10:48

Best code is presented below. Other methods and time measuring are available in full source.

static unsafe bool BySimdUnrolled (byte[] data)
{
    fixed (byte* bytes = data) {
        int len = data.Length;
        int rem = len % (16 * 16);
        Vector16b* b = (Vector16b*)bytes;
        Vector16b* e = (Vector16b*)(bytes + len - rem);
        Vector16b zero = Vector16b.Zero;

        while (b < e) {
            if ((*(b) | *(b + 1) | *(b + 2) | *(b + 3) | *(b + 4) |
                *(b + 5) | *(b + 6) | *(b + 7) | *(b + 8) |
                *(b + 9) | *(b + 10) | *(b + 11) | *(b + 12) | 
                *(b + 13) | *(b + 14) | *(b + 15)) != zero)
                return false;
            b += 16;
        }

        for (int i = 0; i < rem; i++)
            if (data [len - 1 - i] != 0)
                return false;

        return true;
    }
}

Eventually it was beaten by this code:

static unsafe bool ByFixedLongUnrolled (byte[] data)
{
    fixed (byte* bytes = data) {
        int len = data.Length;
        int rem = len % (sizeof(long) * 16);
        long* b = (long*)bytes;
        long* e = (long*)(bytes + len - rem);

        while (b < e) {
            if ((*(b) | *(b + 1) | *(b + 2) | *(b + 3) | *(b + 4) |
                *(b + 5) | *(b + 6) | *(b + 7) | *(b + 8) |
                *(b + 9) | *(b + 10) | *(b + 11) | *(b + 12) | 
                *(b + 13) | *(b + 14) | *(b + 15)) != 0)
                return false;
            b += 16;
        }

        for (int i = 0; i < rem; i++)
            if (data [len - 1 - i] != 0)
                return false;

        return true;
    }
}

Time measurements (on 256MB array):

LINQ All(b => b == 0)                   : 6350,4185 ms
Foreach over byte[]                     : 580,4394 ms
For with byte[].Length property         : 809,7283 ms
For with Length in local variable       : 407,2158 ms
For unrolled 16 times                   : 334,8038 ms
For fixed byte*                         : 272,386 ms
For fixed byte* unrolled 16 times       : 141,2775 ms
For fixed long*                         : 52,0284 ms
For fixed long* unrolled 16 times       : 25,9794 ms
SIMD Vector16b equals Vector16b.Zero    : 56,9328 ms
SIMD Vector16b also unrolled 16 times   : 32,6358 ms

Conclusions:

  • Mono.Simd has only a limited set of instructions. I found no instructions for computing scalar sum(vector) or max(vector). There is however vector equality operator returning bool.
  • Loop unrolling is a powerful technique. Even fastest code benefits a lot from using it.
  • LINQ is ungodly slow because it uses delegates from lambda expressions. If you need cutting edge performance then clearly that is not the way to go.
  • All methods presented use short circuit evaluation, meaning they end as soon as they encounter non-zero.
  • SIMD code was eventually beaten. There are other questions on SO disputing whether SIMD actually makes things faster.

Posted this code on Peer Review, so far 2 bugs found and fixed.

查看更多
登录 后发表回答