Is there a clever (ie: branchless) way to "compact" a hex number. Basically move all the 0s all to one side?
eg:
0x10302040 -> 0x13240000
or
0x10302040 -> 0x00001324
I looked on Bit Twiddling Hacks but didn't see anything.
It's for a SSE numerical pivoting algorithm. I need to remove any pivots that become 0. I can use _mm_cmpgt_ps
to find good pivots, _mm_movemask_ps
to convert that in to a mask, and then bit hacks to get something like the above. The hex value gets munged in to a mask for a _mm_shuffle_ps
instruction to perform a permutation on the SSE 128 bit register.
I came up with the following solution. Please take a look, maybe it will help you.
Output:
To compute mask for
_pext
:First do bit-or on pairs of bits, then on quads. Masks prevent shifted values from overflowing to other digits.
After computing mask this way or harold's way (which is probably faster) you don't need the full power of
_pext
, so if targeted hardware doesn't support it you can replace it with this:Each iteration moves all nibbles one digit to the right if there is some space.
stay_mask
marks bits that are in their final positions. This uses somewhat less operations than Hacker's Delight solution, but might still benefit from branching.Supposing we can use
_pext_u32
, the issue then is computing a mask that has an F for every nibble that isn't zero. I'm not sure what the best approach is, but you can compute the OR of the 4 bits of the nibble and then "spread" it back out to F's like this:Then use that as the mask of
_pext_u32
._pext_u32
can be emulated by this (taken from Hacker's Delight, figure 7.6)But that's a bit of a disaster. It's probably better to just resort to branching code then.
I think this (or something similar) is what you're looking for. Eliminating the 0 nibbles within a number. I've not debugged it, and it would only works on one side atm.
If your processor supports conditional instruction execution, you may get a benefit from this algorithm:
This looks like a good candidate for loop unrolling.
The algorithm assumes that when the original value is shifted left, zeros are shifted in, filling in the "empty" bits.
Edit 1: On a processor that supports conditional execution of instructions, the shifting of the original value would be conditionally executed depending on the result of the ANDing of the original value and the mask. Thus no branching, only ignored instructions.