I was looking into how Java counts bits set of an int.
I had in my mind something simple like this (which I think is correct):
public static int bitCount(int number){
final int MASK = 0x1;
int count = 0;
for(int i = 0; i < 32; i++){
if(((number >>> i) & MASK) == MASK){
count++;
}
}
return count;
}
Instead I found a method that I absolutely have no idea what is doing (seems like magic to me):
i = i - ((i >>> 1) & 0x55555555);
i = (i & 0x33333333) + ((i >>> 2) & 0x33333333);
i = (i + (i >>> 4)) & 0x0f0f0f0f;
i = i + (i >>> 8);
i = i + (i >>> 16);
return i & 0x3f;
Could someone help understand what this does and why a simple function like the one I came up as first thought is/could be bad?
Sure
The bit pattern of 5 is
0101
(four bits), so the mask is a repetition of the bit pattern01
sixteen times. This line counts the number of bits in every two-bit pair of the number. If you consider two-bit pairs,(i >>> 1) & 0x1
gets the high-order bit in the low-order position. Now, there are four possible two-bit patternsand each two-bit pair now contains the number of bits the original had in those positions.
Next, we count the bits that were in each four-bit group (aka nibble). By masking a nibble with
0x3 = 0011(b)
, we get the count of bits that were in the lower order two bits of the nibble, and(i >>> 2) & 0x3
gets the count from the higher order two bits of the nibble. These counts are now added. Since each count was at most 2, the sum is at most 4, hence doesn't leave the nibble.Now the bits in each octet are counted. Each nibble contains the count of bits set in the original in that place. Shifting right by four bits moves the count from the higher order nibble in each place to the lower order nibble, these are added. Then we also moved the count from the lower order nibble to the higher order nible of the adjacent octet, that is masked out by the
& 0x0f0f0f0f
. Since each octet can have at most eight bits set, the count doesn't leave the lower order nibble of the octet.Now we add the counts of the pairs of adjacent octets.
Now we add the totals of the counts in the high order two octets and the low order two.
Finally, all octets except the lowest order octet are masked out, since the higher order octets still contained intermediate counts.
The reason why your simple implementation might be considered bad is performance. The convoluted bit-hack uses fewer operations and no branches, so it is faster. It is, however, far easier to get wrong.
Another neat way to count the set bits in an
int
(orlong
) isThat works because
n = n & (n-1)
clears the last set bit inn
and leaves everything else untouched. If the bit pattern ofn
ends withthe bit pattern of
n-1
isand the bits before the last 1-bit in
n
are the same. In Java, that is guaranteed to work also for negative numbers because integer overflow has wrap-around semantics, so ifn = Integer.MIN_VALUE
, the bit pattern is100...0
andn - 1
becomesInteger.MAX_VALUE
with the bit pattern011...1
.the cool method only works since the computer has a finite range of values for int. it won't work (and so other cool bits algorithms) so easily for infinite range (like BigInteger) since you need to know all of the needed masks before the calculation .
in any case , you can read how it works via : http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
it's at the bottom of this chapter.