How can I count the number of leading zeros in a 128-bit integer (uint128_t
) efficiently?
I know GCC's built-in functions:
__builtin_clz
,__builtin_clzl
,__builtin_clzll
__builtin_ffs
,__builtin_ffsl
,__builtin_ffsll
However, these functions only work with 32- and 64-bit integers.
I also found some SSE instructions:
__lzcnt16
,__lzcnt
,__lzcnt64
As you may guess, these only work with 16-, 32- and 64-bit integers.
Is there any similar, efficient built-in functionality for 128-bit integers?
Yakk's answer works well for all kinds of targets as long as gcc supports 128 bit integers for the target. However, note that on the x86-64 platform, with an Intel Haswell processor or newer, there is a more efficient solution:
The _lzcnt_u64 intrinsic compiles (gcc 5.4) to the lzcnt instruction, which is well defined for a zero input (it returns 64), in contrary to gcc's __builtin_clzll(). The ternary operator compiles to the cmove instruction.
this is a branch free variant. Note that more work is done than in the branchy solution, and in practice the branching will probably be predictable.
It also relies on
__builtin_clzll
not crashing when fed 0: the docs say the result is undefined, but is it just unspecified or undefined?Assuming a 'random' distribution, the first non-zero bit will be in the high 64 bits, with an overwhelming probability, so it makes sense to test that half first.
Have a look at the code generated for:
I would expect gcc to implement each
__builtin_clzll
using thebsrq
instruction - bit scan reverse, i.e., most-significant bit position - in conjunction with anxor
,(msb ^ 63)
, orsub
,(63 - msb)
, to turn it into a leading zero count. gcc might generatelzcnt
instructions with the right-march=
(architecture) options.Edit: others have pointed out that the 'distribution' is not relevant in this case, since the HI uint64_t needs to be tested regardless.