Replacing values in a pandas multi-index

2020-04-08 12:59发布

问题:

I have a dataframe with a multi-index. I want to change the value of the 2nd index when certain conditions on the first index are met. I found a similar (but different) question here: Replace a value in MultiIndex (pandas) which doesn't answer my point because that was about changing a single row, and the solution passed the value of the first index (which didn't need changing), too. In my case I am dealing with multiple rows and I haven't been able to adapt that solution to my case.

A minimal example of my data is below. Thanks!

import pandas as pd
import numpy as np

consdf=pd.DataFrame()

for mylocation in ['North','South']:
    for scenario in np.arange(1,4):
        df= pd.DataFrame()
        df['mylocation'] = [mylocation]
        df['scenario']= [scenario]
        df['this'] = np.random.randint(10,100)
        df['that'] = df['this']  * 2
        df['something else']  = df['this'] * 3
        consdf=pd.concat((consdf, df ), axis=0, ignore_index=True)

mypiv = consdf.pivot('mylocation','scenario').transpose()

level_list =['this','that']
# if level 0 is in level_list --> set level 1 to np.nan
mypiv.iloc[mypiv.index.get_level_values(0).isin(level_list)].index.set_levels([np.nan], level =1, inplace=True)

The last line doesn't work: I get:

ValueError: On level 1, label max (2) >= length of level  (1). NOTE: this index is in an inconsistent state

回答1:

IIUC you could add new value to level values, and then change labels for your index, using advanced indexing, get_level_values, set_levels and set_labels methods:

len_ind = len(mypiv.loc[(level_list,)].index.get_level_values(1))
mypiv.index.set_levels([1, 2, 3, np.nan], level=1, inplace=True)
mypiv.index.set_labels([3]*len_ind + mypiv.index.labels[1][len_ind:].tolist(), level=1, inplace=True)

In [219]: mypiv
Out[219]: 
mylocation               North  South
               scenario              
this           NaN          26     46
               NaN          32     67
               NaN          75     30
that           NaN          52     92
               NaN          64    134
               NaN         150     60
something else  1.0         78    138
                2.0         96    201
                3.0        225     90

Note You values for other scenario will convert to float because it should be one type and np.nan has float type.



回答2:

Note: ix has been deprecated in Pandas 0.20+. Use loc accessor instead.

Here is a solution, using reset_index() method:

In [95]: new = mypiv.reset_index()

In [96]: new
Out[96]:
mylocation         level_0  scenario  North  South
0                     this         1     32     64
1                     this         2     18     40
2                     this         3     76     56
3                     that         1     64    128
4                     that         2     36     80
5                     that         3    152    112
6           something else         1     96    192
7           something else         2     54    120
8           something else         3    228    168

In [100]: new.ix[new.level_0.isin(level_list), 'scenario'] = np.nan

In [101]: new
Out[101]:
mylocation         level_0  scenario  North  South
0                     this       NaN     32     64
1                     this       NaN     18     40
2                     this       NaN     76     56
3                     that       NaN     64    128
4                     that       NaN     36     80
5                     that       NaN    152    112
6           something else       1.0     96    192
7           something else       2.0     54    120
8           something else       3.0    228    168

In [103]: mypiv = new.set_index(['level_0', 'scenario'])

In [104]: mypiv
Out[104]:
mylocation               North  South
level_0        scenario
this           NaN          32     64
               NaN          18     40
               NaN          76     56
that           NaN          64    128
               NaN          36     80
               NaN         152    112
something else 1.0          96    192
               2.0          54    120
               3.0         228    168

But I suspect there is a more elegant solution.