I have a multiple dataframe like below.
df1 = pd.DataFrame({'Col1':["aaa","ddd","ggg"],'Col2':["bbb","eee","hhh"],'Col3':"ccc","fff","iii"]})
df2= pd.DataFrame({'Col1':["aaa","zzz","qqq"],'Col2':["bbb","xxx","eee"],'Col3':["ccc", yyy","www"]})
df3= pd.DataFrame({'Col1':"rrr","zzz","qqq","ppp"],'Col2':"ttt","xxx","eee","ttt"],'Col3':"yyy","yyy","www","qqq"]})
The dataframe has 3 columns and sometimes their rows overlap among the dataframes. (e.g. df1 and df2 has an identical row as "aaa, bbb, ccc").
I want to know how the rows overlap among dataframes and want to make an output like below.
In this output, if an identical row is detected in the dataframe, the output will be 1, otherwise 0. Does anyone know how to make this output?
In the actual data, I have ~100 dataframes. I first tried to use pd.merge but could not apply this to 100 dataframes...
Thank you very much for your help.
Here is one way using
concat
andget_dummies
:Using
pandas.concat
andgroupby
:Output:
Setup:
Solution:
First create a indicate column for each dataframe, then concat, groupby and sum.