可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have two dataframes of different size (df1 nad df2). I would like to remove from df1 all the rows which are stored within df2.

So if I have df2 equals to:

     A  B
0  wer  6
1  tyu  7

And df1 equals to:

     A  B  C
0  qwe  5  a
1  wer  6  s
2  wer  6  d
3  rty  9  f
4  tyu  7  g
5  tyu  7  h
6  tyu  7  j
7  iop  1  k

The final result should be like so:

     A  B  C
0  qwe  5  a
1  rty  9  f
2  iop  1  k

I was able to achieve my goal by using a for loop but I would like to know if there is a better and more elegant and efficient way to perform such operation.

Here is the code I wrote in case you need it: import pandas as pd

df1 = pd.DataFrame({'A' : ['qwe', 'wer', 'wer', 'rty', 'tyu', 'tyu', 'tyu', 'iop'],
                    'B' : [    5,     6,     6,     9,     7,     7,     7,     1],
                    'C' : ['a'  ,   's',   'd',   'f',   'g',   'h',   'j',   'k']})

df2 = pd.DataFrame({'A' : ['wer', 'tyu'],
                    'B' : [    6,     7]})

for i, row in df2.iterrows():
    df1 = df1[(df1['A']!=row['A']) & (df1['B']!=row['B'])].reset_index(drop=True)

回答1:

Use merge with outer join with filter by query, last remove helper column by drop:

df = pd.merge(df1, df2, on=['A','B'], how='outer', indicator=True)
       .query("_merge != 'both'")
       .drop('_merge', axis=1)
       .reset_index(drop=True)
print (df)
     A  B  C
0  qwe  5  a
1  rty  9  f
2  iop  1  k

回答2:

You can use np.in1d to check if any row in df1 exists in df2. And then use it as a reversed mask to select rows from df1.

df1[~df1[['A','B']].apply(lambda x: np.in1d(x,df2).all(),axis=1)]\
                   .reset_index(drop=True)
Out[115]: 
     A  B  C
0  qwe  5  a
1  rty  9  f
2  iop  1  k

回答3:

pandas has a method called isin, however this relies on unique indices. We can define a lambda function to create columns we can use in this from the existing 'A' and 'B' of df1 and df2. We then negate this (as we want the values not in df2) and reset the index:

import pandas as pd

df1 = pd.DataFrame({'A' : ['qwe', 'wer', 'wer', 'rty', 'tyu', 'tyu', 'tyu', 'iop'],
                    'B' : [    5,     6,     6,     9,     7,     7,     7,     1],
                    'C' : ['a'  ,   's',   'd',   'f',   'g',   'h',   'j',   'k']})

df2 = pd.DataFrame({'A' : ['wer', 'tyu'],
                    'B' : [    6,     7]})

unique_ind = lambda df: df['A'].astype(str) + '_' + df['B'].astype(str)
print df1[~unique_ind(df1).isin(unique_ind(df2))].reset_index(drop=True)

printing:

     A  B  C
0  qwe  5  a
1  rty  9  f
2  iop  1  k

回答4:

The cleanest way I found was to use drop from pandas using the index of the dataframe you want to drop:

df1.drop(df2.index, axis=0,inplace=True)

Remove one dataframe from another with Pandas

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

Remove one dataframe from another with Pandas

问题:

回答1:

回答2:

回答3:

回答4:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮