fastest way to convert bitstring numpy array to in

2020-07-18 10:06发布

问题:

I have a numpy array consisting of bitstrings and I intend to convert bitstrings to integer base 2 in order to perform some xor bitwise operations. I can convert string to integer with base 2 in python with this:

int('000011000',2)

I am wondering if there is a faster and better way to do this in numpy. An example of numpy array that I am working on is something like this:

array([['0001'],
       ['0010']], 
      dtype='|S4')

and I expect to convert it to:

array([[1],[2]])

回答1:

One could use np.fromstring to separate out each of the string bits into uint8 type numerals and then use some maths with matrix-multiplication to convert/reduce to decimal format. Thus, with A as the input array, one approach would be like so -

# Convert each bit of input string to numerals
str2num = (np.fromstring(A, dtype=np.uint8)-48).reshape(-1,4)

# Setup conversion array for binary number to decimal equivalent
de2bi_convarr = 2**np.arange(3,-1,-1)

# Use matrix multiplication for reducing each row of str2num to a single decimal
out = str2num.dot(de2bi_convarr)

Sample run -

In [113]: A    # Modified to show more variety
Out[113]: 
array([['0001'],
       ['1001'],
       ['1100'],
       ['0010']], 
      dtype='|S4')

In [114]: str2num = (np.fromstring(A, dtype=np.uint8)-48).reshape(-1,4)

In [115]: str2num
Out[115]: 
array([[0, 0, 0, 1],
       [1, 0, 0, 1],
       [1, 1, 0, 0],
       [0, 0, 1, 0]], dtype=uint8)

In [116]: de2bi_convarr = 2**np.arange(3,-1,-1)

In [117]: de2bi_convarr
Out[117]: array([8, 4, 2, 1])

In [118]: out = str2num.dot(de2bi_convarr)

In [119]: out
Out[119]: array([ 1,  9, 12,  2])

An alternative method could be suggested to avoid np.fromstring. With this method, we would convert to int datatype at the start, then separate out each digit, which should be equivalent of str2num in the previous method. Rest of the code would stay the same. Thus, an alternative implementation would be -

# Convert to int array and thus convert each bit of input string to numerals
str2num = np.remainder(A.astype(np.int)//(10**np.arange(3,-1,-1)),10)

de2bi_convarr = 2**np.arange(3,-1,-1)
out = str2num.dot(de2bi_convarr)

Runtime tests

Let's time all the approaches listed thus far to solve the problem, including @Kasramvd's loopy solution.

In [198]: # Setup a huge array of such strings
     ...: A = np.array([['0001'],['1001'],['1100'],['0010']],dtype='|S4')
     ...: A = A.repeat(10000,axis=0)


In [199]: def app1(A):             
     ...:     str2num = (np.fromstring(A, dtype=np.uint8)-48).reshape(-1,4)
     ...:     de2bi_convarr = 2**np.arange(3,-1,-1)
     ...:     out = str2num.dot(de2bi_convarr)    
     ...:     return out
     ...: 
     ...: def app2(A):             
     ...:     str2num = np.remainder(A.astype(np.int)//(10**np.arange(3,-1,-1)),10)
     ...:     de2bi_convarr = 2**np.arange(3,-1,-1)
     ...:     out = str2num.dot(de2bi_convarr)    
     ...:     return out
     ...: 

In [200]: %timeit app1(A)
1000 loops, best of 3: 1.46 ms per loop

In [201]: %timeit app2(A)
10 loops, best of 3: 36.6 ms per loop

In [202]: %timeit np.array([[int(i[0], 2)] for i in A]) # @Kasramvd's solution
10 loops, best of 3: 61.6 ms per loop


回答2:

Due to KISS principle, I'd like to suggest the following approach using a list comprehension:

>>> np.array([[int(i[0], 2)] for i in a])
array([[1],
       [2]])