Unset the most significant bit in a word (int32) [

2019-02-14 14:32发布

问题:

How can I unset the most significant setted bit of a word (e.g. 0x00556844 -> 0x00156844)? There is a __builtin_clz in gcc, but it just counts the zeroes, which is unneeded to me. Also, how should I replace __builtin_clz for msvc or intel c compiler?

Current my code is

 int msb = 1<< ((sizeof(int)*8)-__builtin_clz(input)-1);
 int result = input & ~msb;

UPDATE: Ok, if you says that this code is rather fast, I'll ask you, how should I add a portability to this code? This version is for GCC, but MSVC & ICC?

回答1:

Just round down to the nearest power of 2 and then XOR that with the original value, e.g. using flp2() from Hacker's Delight:

uint32_t flp2(uint32_t x) // round x down to nearest power of 2
{
    x = x | (x >> 1); 
    x = x | (x >> 2); 
    x = x | (x >> 4); 
    x = x | (x >> 8); 
    x = x | (x >>16); 
    return x - (x >> 1); 
}

uint32_t clr_msb(uint32_t x) // clear most significant set bit in x
{
    msb = flp2(x);  // get MS set bit in x
    return x ^ msb; // XOR MS set bit to clear it
}


回答2:

If you are truly concerned with performance, the best way to clear the msb has recently changed for x86 with the addition of BMI instructions.

In x86 assembly:

clear_msb:
    bsrq    %rdi, %rax
    bzhiq   %rax, %rdi, %rax
    retq

Now to rewrite in C and let the compiler emit these instructions while gracefully degrading for non-x86 architectures or older x86 processors that don't support BMI instructions.

Compared to the assembly code, the C version is really ugly and verbose. But at least it meets the objective of portability. And if you have the necessary hardware and compiler directives (-mbmi, -mbmi2) to match, you're back to the beautiful assembly code after compilation.

As written, bsr() relies on a GCC/Clang builtin. If targeting other compilers you can replace with equivalent portable C code and/or different compiler-specific builtins.

#include <inttypes.h>
#include <stdio.h>

uint64_t bsr(const uint64_t n)
{
        return 63 - (uint64_t)__builtin_clzll(n);
}

uint64_t bzhi(const uint64_t n,
              const uint64_t index)
{
        const uint64_t leading = (uint64_t)1 << index;
        const uint64_t keep_bits = leading - 1;
        return n & keep_bits;
}

uint64_t clear_msb(const uint64_t n)
{
        return bzhi(n, bsr(n));
}

int main(void)
{
        uint64_t i;
        for (i = 0; i < (uint64_t)1 << 16; ++i) {
                printf("%" PRIu64 "\n", clear_msb(i));
        }
        return 0;
}

Both assembly and C versions lend themselves naturally to being replaced with 32-bit instructions, as the original question was posed.



回答3:

You can do

unsigned resetLeadingBit(uint32_t x) {
    return x & ~(0x80000000U >> __builtin_clz(x))
}

For MSVC there is _BitScanReverse, which is 31-__builtin_clz().

Actually its the other way around, BSR is the natural x86 instruction, and the gcc intrinsic is implemented as 31-BSR.