how to identify whats NOT in the inner join during

2020-07-10 19:50发布

I have one frame that looks like this (df):

                                 2000q1         2000q2         2000q3  
State        RegionName                                                  
New York     New York                NaN            NaN            NaN   
California   Los Angeles   207066.666667  214466.666667  220966.666667   
Illinois     Chicago       138400.000000  143633.333333  147866.666667

(notice that State,RegionName here is a MultiIndex)

and one frame that looks like this (ut):

     State    RegionName
0  Alabama        Auburn
1  Alabama      Florence
2  Alabama  Jacksonville
3  Alabama    Livingston
4  Alabama    Montevallo

So to get all rows where State,RegionName are in both dataframes, I do this:

dfut = pd.merge(df, ut, how='inner', left_index=True, right_on=['State', 'RegionName'])

That works. I now want a list of rows where rows from df frame are NOT in ut frame -- like a "NOT inner join". I am pretty sure that I need to do a LEFT join which will give me the entire df, but I am not sure how to subtract ut intersecting rows out of it. Hope its clear. Thank you

1条回答
太酷不给撩
2楼-- · 2020-07-10 20:37

include the parameter indicator=True in your merge and query('_merge != "both"')

dfut = pd.merge(df, ut, how='outer',
                left_index=True, right_on=['State', 'RegionName'],
                indicator=True)

dfut.query('_merge != "both"')

enter image description here

查看更多
登录 后发表回答