I need a fast way to count the number of bits in an integer in python. My current solutions is
bin(n).count("1")
but I am wondering if there is any faster way of doing this?
PS: (i am representing a big 2D binary array as a singe list of numbers and doing bitwise operations, and that brings the time down from hours to minutes. and now i would like to get rid of those extra minutes.
Edit: 1. it has to be in python 2.7 or 2.6
and optimizing for small numbers does not matter that much since that would not be a clear bottle neck, but I do have numbers with 10 000 + bits at some places
for example this is a 2000 bit case:
12448057941136394342297748548545082997815840357634948550739612798732309975923280685245876950055614362283769710705811182976142803324242407017104841062064840113262840137625582646683068904149296501029754654149991842951570880471230098259905004533869130509989042199261339990315125973721454059973605358766253998615919997174542922163484086066438120268185904663422979603026066685824578356173882166747093246377302371176167843247359636030248569148734824287739046916641832890744168385253915508446422276378715722482359321205673933317512861336054835392844676749610712462818600179225635467147870208L
Here's a Python implementation of the population count algorithm, as explained in this post:
It will work for
0 <= i < 0x100000000
.You said Numpy was too slow. Were you using it to store individual bits? Why not extend the idea of using ints as bit arrays but use Numpy to store those?
Store n bits as an array of
ceil(n/32.)
32-bit ints. You can then work with the numpy array the same (well, similar enough) way you use ints, including using them to index another array.The algorithm is basically to compute, in parallel, the number of bits set in each cell, and them sum up the bitcount of each cell.
Though I'm surprised no one suggested you write a C module.
It turns out your starting representation is a list of lists of ints which are either 1 or 0. Simply count them in that representation.
The number of bits in an integer is constant in python.
However, if you want to count the number of set bits, the fastest way is to create a list conforming to the following pseudocode:
[numberofsetbits(n) for n in range(MAXINT)]
This will provide you a constant time lookup after you have generated the list. See @PaoloMoretti's answer for a good implementation of this. Of course, you don't have to keep this all in memory - you could use some sort of persistent key-value store, or even MySql. (Another option would be to implement your own simple disk-based storage).
For arbitrary-length integers,
bin(n).count("1")
is the fastest I could find in pure Python.I tried adapting Óscar's and Adam's solutions to process the integer in 64-bit and 32-bit chunks, respectively. Both were at least ten times slower than
bin(n).count("1")
(the 32-bit version took about half again as much time).On the other hand, gmpy
popcount()
took about 1/20th of the time ofbin(n).count("1")
. So if you can install gmpy, use that.To answer a question in the comments, for bytes I'd use o lookup table. You can generate it at runtime:
Or just do that once and paste the
repr()
in to define it literally:Then it's
counts[x]
to get the number of 1 bits inx
where 0 <= x <= 255.You can use the algorithm to get the binary string [1] of an integer but instead of concatenating the string, counting the number of ones:
[1] https://wiki.python.org/moin/BitManipulation
You can adapt the following algorithm:
This works for 64-bit positive numbers, but it's easily extendable and the number of operations growth with the logarithm of the argument (i.e. linearly with the bit-size of the argument).
In order to understand how this works imagine that you divide the entire 64-bit string into 64 1-bit buckets. Each bucket's value is equal to the number of bits set in the bucket (0 if no bits are set and 1 if one bit is set). The first transformation results in an analogous state, but with 32 buckets each 2-bit long. This is achieved by appropriately shifting the buckets and adding their values (one addition takes care of all buckets since no carry can occur across buckets - n-bit number is always long enough to encode number n). Further transformations lead to states with exponentially decreasing number of buckets of exponentially growing size until we arrive at one 64-bit long bucket. This gives the number of bits set in the original argument.