same as this python pandas: how to find rows in one dataframe but not in another? but with multiple columns
This is the setup:
import pandas as pd
df = pd.DataFrame(dict(
col1=[0,1,1,2],
col2=['a','b','c','b'],
extra_col=['this','is','just','something']
))
other = pd.DataFrame(dict(
col1=[1,2],
col2=['b','c']
))
Now, I want to select the rows from df
which don't exist in other. I want to do the selection by col1
and col2
In SQL I would do:
select * from df
where not exists (
select * from other o
where df.col1 = o.col1 and
df.col2 = o.col2
)
And in Pandas I can do something like this but it feels very ugly. Part of the ugliness could be avoided if df had id-column but it's not always available.
key_col = ['col1','col2']
df_with_idx = df.reset_index()
common = pd.merge(df_with_idx,other,on=key_col)['index']
mask = df_with_idx['index'].isin(common)
desired_result = df_with_idx[~mask].drop('index',axis=1)
So maybe there is some more elegant way?
Interesting
Returns:
Seems a little bit more elegant...
Since
0.17.0
there is a newindicator
param you can pass tomerge
which will tell you whether the rows are only present in left, right or both:So you can now filter the merged df by selecting only
'left_only'
rows