Counting the number of leading zeros in a 128-bit

2020-03-04 08:01发布

How can I count the number of leading zeros in a 128-bit integer (uint128_t) efficiently?

I know GCC's built-in functions:

  • __builtin_clz, __builtin_clzl, __builtin_clzll
  • __builtin_ffs, __builtin_ffsl, __builtin_ffsll

However, these functions only work with 32- and 64-bit integers.

I also found some SSE instructions:

  • __lzcnt16, __lzcnt, __lzcnt64

As you may guess, these only work with 16-, 32- and 64-bit integers.

Is there any similar, efficient built-in functionality for 128-bit integers?

3条回答
Juvenile、少年°
2楼-- · 2020-03-04 08:35

Yakk's answer works well for all kinds of targets as long as gcc supports 128 bit integers for the target. However, note that on the x86-64 platform, with an Intel Haswell processor or newer, there is a more efficient solution:

#include <immintrin.h>
#include <stdint.h>
// tested with compiler options: gcc -O3 -Wall -m64  -mlzcnt

inline int lzcnt_u128 (unsigned __int128 u) {
  uint64_t hi = u>>64;
  uint64_t lo = u;
  lo = (hi == 0) ? lo : -1ULL;
  return _lzcnt_u64(hi) + _lzcnt_u64(lo);
}

The _lzcnt_u64 intrinsic compiles (gcc 5.4) to the lzcnt instruction, which is well defined for a zero input (it returns 64), in contrary to gcc's __builtin_clzll(). The ternary operator compiles to the cmove instruction.

查看更多
▲ chillily
3楼-- · 2020-03-04 08:45
inline int clz_u128 (uint128_t u) {
  uint64_t hi = u>>64;
  uint64_t lo = u;
  int retval[3]={
    __builtin_clzll(hi),
    __builtin_clzll(lo)+64,
    128
  };
  int idx = !hi + ((!lo)&(!hi));
  return retval[idx];
}

this is a branch free variant. Note that more work is done than in the branchy solution, and in practice the branching will probably be predictable.

It also relies on __builtin_clzll not crashing when fed 0: the docs say the result is undefined, but is it just unspecified or undefined?

查看更多
再贱就再见
4楼-- · 2020-03-04 08:45

Assuming a 'random' distribution, the first non-zero bit will be in the high 64 bits, with an overwhelming probability, so it makes sense to test that half first.

Have a look at the code generated for:

/* inline */ int clz_u128 (uint128_t u)
{
    unsigned long long hi, lo; /* (or uint64_t) */
    int b = 128;

    if ((hi = u >> 64) != 0) {
        b = __builtin_clzll(hi);
    }
    else if ((lo = u & ~0ULL) != 0) {
        b = __builtin_clzll(lo) + 64;
    }

    return b;
}

I would expect gcc to implement each __builtin_clzll using the bsrq instruction - bit scan reverse, i.e., most-significant bit position - in conjunction with an xor, (msb ^ 63), or sub, (63 - msb), to turn it into a leading zero count. gcc might generate lzcnt instructions with the right -march= (architecture) options.


Edit: others have pointed out that the 'distribution' is not relevant in this case, since the HI uint64_t needs to be tested regardless.

查看更多
登录 后发表回答