8 bits representing the number 7 look like this:
00000111
Three bits are set.
What are algorithms to determine the number of set bits in a 32-bit integer?
8 bits representing the number 7 look like this:
00000111
Three bits are set.
What are algorithms to determine the number of set bits in a 32-bit integer?
What do you means with "Best algorithm"? The shorted code or the fasted code? Your code look very elegant and it has a constant execution time. The code is also very short.
But if the speed is the major factor and not the code size then I think the follow can be faster:
I think that this will not more faster for a 64 bit value but a 32 bit value can be faster.
This is known as the 'Hamming Weight', 'popcount' or 'sideways addition'.
The 'best' algorithm really depends on which CPU you are on and what your usage pattern is.
Some CPUs have a single built-in instruction to do it and others have parallel instructions which act on bit vectors. The parallel instructions (like x86's
popcnt
, on CPUs where it's supported) will almost certainly be fastest. Some other architectures may have a slow instruction implemented with a microcoded loop that tests a bit per cycle (citation needed).A pre-populated table lookup method can be very fast if your CPU has a large cache and/or you are doing lots of these instructions in a tight loop. However it can suffer because of the expense of a 'cache miss', where the CPU has to fetch some of the table from main memory.
If you know that your bytes will be mostly 0's or mostly 1's then there are very efficient algorithms for these scenarios.
I believe a very good general purpose algorithm is the following, known as 'parallel' or 'variable-precision SWAR algorithm'. I have expressed this in a C-like pseudo language, you may need to adjust it to work for a particular language (e.g. using uint32_t for C++ and >>> in Java):
This has the best worst-case behaviour of any of the algorithms discussed, so will efficiently deal with any usage pattern or values you throw at it.
This bitwise-SWAR algorithm could parallelize to be done in multiple vector elements at once, instead of in a single integer register, for a speedup on CPUs with SIMD but no usable popcount instruction. (e.g. x86-64 code that has to run on any CPU, not just Nehalem or later.)
However, the best way to use vector instructions for popcount is usually by using a variable-shuffle to do a table-lookup for 4 bits at a time of each byte in parallel. (The 4 bits index a 16 entry table held in a vector register).
On Intel CPUs, the hardware 64bit popcnt instruction can outperform an SSSE3
PSHUFB
bit-parallel implementation by about a factor of 2, but only if your compiler gets it just right. Otherwise SSE can come out significantly ahead. Newer compiler versions are aware of the popcnt false dependency problem on Intel.References:
https://graphics.stanford.edu/~seander/bithacks.html
https://en.wikipedia.org/wiki/Hamming_weight
http://gurmeet.net/puzzles/fast-bit-counting-routines/
http://aggregate.ee.engr.uky.edu/MAGIC/#Population%20Count%20(Ones%20Count)
if you're using C++ another option is to use template metaprogramming:
usage would be:
you could of course further expand this template to use different types (even auto-detecting bit size) but I've kept it simple for clarity.
edit: forgot to mention this is good because it should work in any C++ compiler and it basically just unrolls your loop for you if a constant value is used for the bit count (in other words, I'm pretty sure it's the fastest general method you'll find)
I always use this in Competitive Programming and it's easy to write and efficient:
There are many algorithm to count the set bits; but i think the best one is the faster one! You can see the detailed on this page:
Bit Twiddling Hacks
I suggest this one:
Counting bits set in 14, 24, or 32-bit words using 64-bit instructions
This method requires a 64-bit CPU with fast modulus division to be efficient. The first option takes only 3 operations; the second option takes 10; and the third option takes 15.
It's not the fastest or best solution, but I found the same question in my way, and I started to think and think. finally I realized that it can be done like this if you get the problem from mathematical side, and draw a graph, then you find that it's a function which has some periodic part, and then you realize the difference between the periods... so here you go: