pandas merge with MultiIndex, when only one level

2019-08-02 16:32发布

问题:

I have a data frame called df1 with a 2-level MultiIndex (levels: '_Date' and _'ItemId'). There are multiple instances of each value of '_ItemId', like this:

                              _SomeOtherLabel
 _Date            _ItemId     
 2014-10-05       6588921     AA
                  6592520     AB 
                  6836143     BA
 2014-10-11       6588921     CA
                  6592520     CB
                  6836143     DA 

I have a second data frame called df2 with '_ItemId' used as a key (not the index). In this df, there is only one occurrence of each value of _ItemId:

                  _ItemId       _Cat
  0               6588921       6_1
  1               6592520       6_1
  2               6836143       7_1

I want to recover the values in the column '_Cat' from df2 and merge them into df1 for the appropriate values of '_ItemId'. This is almost (I think?) a standard many-to-one merge, except that the appropriate key for the left df is one of MultiIndex levels. I tried this:

df1['_cat']=pd.merge(df1,df2,left_index=True, right_on='ItemId')  

but I get the error

   "ValueError: len(right_on) must equal the number of levels in the index of "left"

which I suppose makes sense since my (left) index is actually made of two keys. How do I select the one index level that I need? Or is there a better approach to this merge?

Thanks

回答1:

I could think of 2 ways of doing this.

use set_index() and join():

>>> df1.join(df2.set_index('_ItemId'))
                   _SomeOtherLabel _Cat
_Date      _ItemId                     
2014-10-05 6588921              AA  6_1
           6592520              AB  6_1
           6836143              BA  7_1
2014-10-11 6588921              CA  6_1
           6592520              CB  6_1
           6836143              DA  7_1

or use reset_index(), merge() and then set new multiindex

I think first approach should be faster, but not sure.