I need to check that all vector elements are non-zero. So far I found following solution. Is there a better way to do this? I am using gcc 4.8.2 on Linux/x86_64, instructions up to SSE4.2.
typedef char ChrVect __attribute__((vector_size(16), aligned(16)));
inline bool testNonzero(ChrVect vect)
{
const ChrVect vzero = {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0};
return (0 == (__int128_t)(vzero == vect));
}
Update: code above is compiled to following assembler code (when compiled as non-inline function):
movdqa %xmm0, -24(%rsp)
pxor %xmm0, %xmm0
pcmpeqb -24(%rsp), %xmm0
movdqa %xmm0, -24(%rsp)
movq -24(%rsp), %rax
orq -16(%rsp), %rax
sete %al
ret
With straight SSE intrinsics you might do it like this:
I suggest looking at what your compiler currently generates for your existing code and then compare it with this version using intrinsics and see if there is any significant difference.
With SSE3 (
clang -O3 -msse3
) I get the following for the above function:The SSE4 version (
clang -O3 -msse4.1
) produces:Note that the zeroing of
xmm1
will typically be hoisted out of any loop containing this function, so the above sequences should be reduced by one instruction when used inside a loop.