I have a dataframe of zeros and ones. I want to treat each column as if its values were a binary representation of an integer. What is easiest way to make this conversion?
I want this:
df = pd.DataFrame([[1, 0, 1], [1, 1, 0], [0, 1, 1], [0, 0, 1]])
print df
0 1 2
0 1 0 1
1 1 1 0
2 0 1 1
3 0 0 1
converted to:
0 12
1 6
2 11
dtype: int64
As efficiently as possible.
Similar in concept to
@jezrael's solution
that useddot-product
, but with couple of improvements. We can avoid the transpose by bringing the 2-powered range array from the front for thedot-product
. This would be beneficial for large arrays, as transposing them would have some overhead. Also, operating on NumPy arrays would be better for these number crunching cases, so we could operate ondf.values
instead. At the end, we need to convert to pandas series/dataframe for the final output.Thus, combining these two improvements, the modified implementation would be -
Runtime test -
You can create a string from the column values and then use
int(binary_string, base=2)
to convert to integer:Not sure about efficiency, multiplying by the relevant powers of 2 then summing probably takes better advantage of fast numpy operations, this is probably more convenient though.
Similar solution, but more faster:
Timings: