Slow bitwise operations

2019-02-13 09:10发布

问题:

I am working on a Python library that performs a lot of bitwise operations on long bit strings, and I want to find a bit string type that will maximize its speed. I have tried the built-in Python int type, numpy, bitstring, and bitarray, and suprisingly, the Python ints seem to win hands down when it comes to bitwise operations. Everything I have googled says numpy should be much faster for vectorized operations like this. Am I using numpy wrong somehow? Is there another Python library I can use that actually improves on Python's built-in int type?

from timeit import timeit
import random


size = 10000


def int_to_bits(i):
    result = []
    for _ in range(size):
        result.append(i % 2)
        i >>= 1
    return result



x = random.randrange(2**size)
y = random.randrange(2**size)

print(x.bit_length(), y.bit_length())

x_bits = int_to_bits(x)
y_bits = int_to_bits(y)

t = timeit(
    stmt='a & b',
    setup='a = %d; b = %d' % (x, y)
)
print("raw ints:", t)

t = timeit(
    stmt='a & b',
    setup=('import numpy;'
           'a = numpy.array(%r, dtype=int);'
           'b = numpy.array(%r, dtype=int)') % (x_bits, y_bits)
)
print('numpy int array:', t)

t = timeit(
    stmt='a & b',
    setup=('import numpy;'
           'a = numpy.array(%r, dtype=bool);'
           'b = numpy.array(%r, dtype=bool)') % (x_bits, y_bits)
)
print('numpy bool array:', t)

t = timeit(
    stmt='a & b',
    setup=('import numpy;'
           'a = numpy.packbits(%r);'
           'b = numpy.packbits(%r)') % (x_bits, y_bits)
)
print('numpy packed bits:', t)

t = timeit(
    stmt='a & b',
    setup=('import bitstring;'
           'a = bitstring.BitString(%r);'
           'b = bitstring.BitString(%r)') % (x_bits, y_bits)
)
print('bitstring:', t)

t = timeit(
    stmt='a & b',
    setup=('import bitarray;'
           'a = bitarray.bitarray(%r);'
           'b = bitarray.bitarray(%r)') % (x_bits, y_bits)
)
print('bitarray:', t)

Results:

10000 10000
raw ints: 0.29606562735373115
numpy int array: 7.400762747057885
numpy bool array: 1.1108355715984288
numpy packed bits: 1.3064737574273284
bitstring: 380.9796937642803
bitarray: 1.4451143449501842

EDIT:

There seems to be a lot of confusion about how single operations on Python ints/longs are comparable to vector operations on entire numpy bit arrays. A 10,000-bit Python int/long value, when treated as a bit mask (using the & operator just like we can do with ints or longs in C/C++) is directly comparable to a numpy bool array of length 10,000, because they both contain the same number of bits, albeit represented in 2 different ways. The same is true for the other ways of representing 10,000 bits that I tried, including using numpy packed bit arrays, numpy int arrays, and bit array/string types from other libraries. They are all comparable because they are all computing the same function on the same sequences of bits. All that matters here is that I can represent all 10,000 bits and that I can perform bitwise operations on them. If anyone can suggest a more efficient way to represent long, fixed-length sequences of bits that allows bitwise operators (&, |, and ~) to be used, that is what I am looking for.

If you're still confused as to how a Python int/long value can store the same information as a numpy bool array or a numpy binary-valued int array, please refer to the int_to_bits function in the code above; it demonstrates how to extract the bits out of a Python int/long, which shows that performing the & operation on two 10,000-bit ints is fundamentally the same as performing it element-by-element on a list or array of 10,000 Boolean values.

回答1:

As far as I can tell, the built-in Python 3 int is the only one of the options you tested that computes the & in chunks of more than one byte at a time. (I haven't fully figured out what everything in the NumPy source for this operation does, but it doesn't look like it has an optimization to compute this in chunks bigger than the dtype.)

  • bitarray goes byte-by-byte,
  • the bool and 1-bit-per-int NumPy attempts go bit by bit,
  • the packed NumPy attempt goes byte-by-byte, and
  • the bitstring source goes byte-by-byte, as well as doing some things that screw up its attempts to gain speed through Cython, making it by far the slowest.

In contrast, the int operation goes by either 15-bit or 30-bit digits, depending on the value of the compile-time parameter PYLONG_BITS_IN_DIGIT. I don't know which setting is the default.

You can speed up the NumPy attempt by using a packed representation and a larger dtype. It looks like on my machine, a 32-bit dtype works fastest, beating Python ints; I don't know what it's like on your setup. Testing with 10240-bit values in each format, I get

>>> timeit.timeit('a & b', 'import numpy; a = b = numpy.array([0]*160, dtype=num
py.uint64)')
1.3918750826524047
>>> timeit.timeit('a & b', 'import numpy; a = b = numpy.array([0]*160*8, dtype=n
umpy.uint8)')
1.9460716604953632
>>> timeit.timeit('a & b', 'import numpy; a = b = numpy.array([0]*160*2, dtype=n
umpy.uint32)')
1.1728465435917315
>>> timeit.timeit('a & b', 'a = b = 2**10240-1')
1.5999407862400403


回答2:

What you are trying to test - are these vector operations at all? You are simply trying to compare speeds of 1 operation and there plain python is going to win 'cos it doesn't have to setup numpy arrays or bitarrays.

How about trying out following?

x = np.array([random.randrange(2**31)]*1000) 
y = np.array([random.randrange(2**31)]*1000) 

%timeit x & y # in ipython

%timeit [ a & b for (a,b) in zip(x,y)] # even though x and y are numpy arrays, we are iterating over them - and not doing any vector operations

Interestingly, if

xxx = [random.randrange(2**31)] * 1000
yyy = [random.randrange(2**31)] * 1000 

and then

%timeit [a & b for (a,b) in zip(xxx,yyy)]

pure python lists, iterating over them is faster than iterating over numpy arrays.. a bit counter intuitive. Not sure why.

Similarly you can try for bitstrings and bitarrays

Is this what you are looking at?