I know this has been asked before, but I'm looking at this particular solution listed here:
int BitCount(unsigned int u)
{
unsigned int uCount;
uCount = u - ((u >> 1) & 033333333333) - ((u >> 2) & 011111111111);
return ((uCount + (uCount >> 3)) & 030707070707) % 63;
}
How does it work?
Are there any caveats involved here?
Theoretically is it possible to find the answer in constant time? I mean don't we actually have to iterate through the bits to count?
Iterating the bits is constant time since the number of bits in a type is constant.
So a solution that checks a mask of one bit and shifts for each bit in the target value is indeed
O(1)
(e.g. where the constant is 32).Counting bits set, in parallel shows how this is done. That method can be used for 8-, 16-, 32-, 64-, 128-, etc. bit words, although the constants used in the calculations change.
When we say that this operation is O(1), we mean that it can be done in constant time regardless of the word size. Counting the bits the naive way is O(n) in the number of bits.
Speaking practically, this is only O(1) when the processor can work with the word size natively.
As for how it works, it uses "magic numbers." See this newsgroup post for an explanation.
Counting bits
An unsigned 32-bit integer
u
can be written like this:u = a31 * 231 + a30 * 230 + ... + a0 * 20
We want the value of
a31 + a30 + ... + a0
.Let's compare the values of
u >> k
:We will calculate the bit population by this formula:
Let's see why this works:
What's the value of
q
? Let's calculate it bit by bit, looking at the values foru >> k
above. Fora31
, it's:Or for
a30
:We find:
q = a31 * (231 - 1) + a30 * (230 - 1) + ...
And thus
Counting bits in 3-bit blocks
This algorithm starts by doing the same thing, but in blocks of 3 bits:
Taking the difference of these, by the above algorithm, each octet in
uCount
contains the number of bits set in the corresponding octet inu
.So
uCount + (uCount >> 3)
is(λ+κ) * 20 + (κ+ι) * 23 + (ι+θ) * 26 + ...
By ANDing with
0o30707070707
, we mask away every other octet, so that we only count each pair once:This is a base-64 number, and we want to sum up the base-64 digits to get
α+β+γ+δ+ε+ζ+η+θ+ι+κ+λ
, our final result. To do this, we calculate its base-64 digital root: knowing that the result can never be bigger than 32, we just modulo the number by 63.The fastest way to do this is popcnt instruction. You can access it through compiler intrinsic usually. Your solution can be useful on platforms that lack this instruction.