I have two columns which I stored sets in my dataframe.
I want to perform set union on the two columns using fast vectorized operation
df['union'] = df.set1 | df.set2
but the error TypeError: unsupported operand type(s) for |: 'set' and 'bool'
is preventing me from doing so as I have type np.nan
in both columns.
Is there a good solution to overcome this?
For these operations pure Python may be more efficient.
If we could use
+
, it would probably take half the time (inheritance may not worth it):DataFrame for timings:
This is the best I could come up with:
Wow!
I expected the method 2 to be quicker. Not so!
Example