8 bits representing the number 7 look like this:
00000111
Three bits are set.
What are algorithms to determine the number of set bits in a 32-bit integer?
8 bits representing the number 7 look like this:
00000111
Three bits are set.
What are algorithms to determine the number of set bits in a 32-bit integer?
I use the below code which is more intuitive.
Logic : n & (n-1) resets the last set bit of n.
P.S : I know this is not O(1) solution, albeit an interesting solution.
In my opinion, the "best" solution is the one that can be read by another programmer (or the original programmer two years later) without copious comments. You may well want the fastest or cleverest solution which some have already provided but I prefer readability over cleverness any time.
If you want more speed (and assuming you document it well to help out your successors), you could use a table lookup:
Although these rely on specific data type sizes so they're not that portable. But, since many performance optimisations aren't portable anyway, that may not be an issue. If you want portability, I'd stick to the readable solution.
Few open questions:-
we can modify the algo to support the negative number as follows:-
now to overcome the second problem we can write the algo like:-
for complete reference see :
http://goursaha.freeoda.com/Miscellaneous/IntegerBitCount.html
I think the fastest way—without using lookup tables and popcount—is the following. It counts the set bits with just 12 operations.
It works because you can count the total number of set bits by dividing in two halves, counting the number of set bits in both halves and then adding them up. Also know as
Divide and Conquer
paradigm. Let's get into detail..The number of bits in two bits can be
0b00
,0b01
or0b10
. Lets try to work this out on 2 bits..This is what was required: the last column shows the count of set bits in every two bit pair. If the two bit number is
>= 2 (0b10)
thenand
produces0b01
, else it produces0b00
.This statement should be easy to understand. After the first operation we have the count of set bits in every two bits, now we sum up that count in every 4 bits.
We then sum up the above result, giving us the total count of set bits in 4 bits. The last statement is the most tricky.
Let's break it down further...
It's similar to the second statement; we are counting the set bits in groups of 4 instead. We know—because of our previous operations—that every nibble has the count of set bits in it. Let's look an example. Suppose we have the byte
0b01000010
. It means the first nibble has its 4bits set and the second one has its 2bits set. Now we add those nibbles together.It gives us the count of set bits in a byte, in the first nibble
0b01100010
and therefore we mask the last four bytes of all the bytes in the number (discarding them).Now every byte has the count of set bits in it. We need to add them up all together. The trick is to multiply the result by
0b10101010
which has an interesting property. If our number has four bytes,A B C D
, it will result in a new number with these bytesA+B+C+D B+C+D C+D D
. A 4 byte number can have maximum of 32 bits set, which can be represented as0b00100000
.All we need now is the first byte which has the sum of all set bits in all the bytes, and we get it by
>> 24
. This algorithm was designed for32 bit
words but can be easily modified for64 bit
words.Also consider the built-in functions of your compilers.
On the GNU compiler for example you can just use:
In the worst case the compiler will generate a call to a function. In the best case the compiler will emit a cpu instruction to do the same job faster.
The GCC intrinsics even work across multiple platforms. Popcount will become mainstream in the x86 architecture, so it makes sense to start using the intrinsic now. Other architectures have the popcount for years.
On x86, you can tell the compiler that it can assume support for
popcnt
instruction with-mpopcnt
or-msse4.2
to also enable the vector instructions that were added in the same generation. See GCC x86 options.-march=nehalem
(or-march=
whatever CPU you want your code to assume and to tune for) could be a good choice. Running the resulting binary on an older CPU will result in an illegal-instruction fault.To make binaries optimized for the machine you build them on, use
-march=native
(with gcc, clang, or ICC).MSVC provides an intrinsic for the x86
popcnt
instruction, but unlike gcc it's really an intrinsic for the hardware instruction and requires hardware support.Using
std::bitset<>::count()
instead of a built-inIn theory, any compiler that knows how to popcount efficiently for the target CPU should expose that functionality through ISO C++
std::bitset<>
. In practice, you might be better off with the bit-hack AND/shift/ADD in some cases for some target CPUs.For target architectures where hardware popcount is an optional extension (like x86), not all compilers have a
std::bitset
that takes advantage of it when available. For example, MSVC has no way to enablepopcnt
support at compile time, and always uses a table lookup, even with/Ox /arch:AVX
(which implies SSE4.2, although technically there is a separate feature bit forpopcnt
.)But at least you get something portable that works everywhere, and with gcc/clang with the right target options, you get hardware popcount for architectures that support it.
See asm from gcc, clang, icc, and MSVC on the Godbolt compiler explorer.
x86-64
gcc -O3 -std=gnu++11 -mpopcnt
emits this:PowerPC64
gcc -O3 -std=gnu++11
emits (for theint
arg version):This source isn't x86-specific or GNU-specific at all, but only compiles well for x86 with gcc/clang/icc.
Also note that gcc's fallback for architectures without single-instruction popcount is a byte-at-a-time table lookup. This isn't wonderful for ARM, for example.
For a happy medium between a 232 lookup table and iterating through each bit individually:
From http://ctips.pbwiki.com/CountBits