I have a dataframe with a multi-index. I want to change the value of the 2nd index when certain conditions on the first index are met.
I found a similar (but different) question here: Replace a value in MultiIndex (pandas)
which doesn't answer my point because that was about changing a single row, and the solution passed the value of the first index (which didn't need changing), too. In my case I am dealing with multiple rows and I haven't been able to adapt that solution to my case.
A minimal example of my data is below. Thanks!
import pandas as pd
import numpy as np
consdf=pd.DataFrame()
for mylocation in ['North','South']:
for scenario in np.arange(1,4):
df= pd.DataFrame()
df['mylocation'] = [mylocation]
df['scenario']= [scenario]
df['this'] = np.random.randint(10,100)
df['that'] = df['this'] * 2
df['something else'] = df['this'] * 3
consdf=pd.concat((consdf, df ), axis=0, ignore_index=True)
mypiv = consdf.pivot('mylocation','scenario').transpose()
level_list =['this','that']
# if level 0 is in level_list --> set level 1 to np.nan
mypiv.iloc[mypiv.index.get_level_values(0).isin(level_list)].index.set_levels([np.nan], level =1, inplace=True)
The last line doesn't work: I get:
ValueError: On level 1, label max (2) >= length of level (1). NOTE: this index is in an inconsistent state
IIUC you could add new value to level values, and then change labels for your index, using advanced indexing, get_level_values
, set_levels
and set_labels
methods:
len_ind = len(mypiv.loc[(level_list,)].index.get_level_values(1))
mypiv.index.set_levels([1, 2, 3, np.nan], level=1, inplace=True)
mypiv.index.set_labels([3]*len_ind + mypiv.index.labels[1][len_ind:].tolist(), level=1, inplace=True)
In [219]: mypiv
Out[219]:
mylocation North South
scenario
this NaN 26 46
NaN 32 67
NaN 75 30
that NaN 52 92
NaN 64 134
NaN 150 60
something else 1.0 78 138
2.0 96 201
3.0 225 90
Note You values for other scenario
will convert to float because it should be one type and np.nan
has float type.
Note: ix
has been deprecated in Pandas 0.20+. Use loc
accessor instead.
Here is a solution, using reset_index()
method:
In [95]: new = mypiv.reset_index()
In [96]: new
Out[96]:
mylocation level_0 scenario North South
0 this 1 32 64
1 this 2 18 40
2 this 3 76 56
3 that 1 64 128
4 that 2 36 80
5 that 3 152 112
6 something else 1 96 192
7 something else 2 54 120
8 something else 3 228 168
In [100]: new.ix[new.level_0.isin(level_list), 'scenario'] = np.nan
In [101]: new
Out[101]:
mylocation level_0 scenario North South
0 this NaN 32 64
1 this NaN 18 40
2 this NaN 76 56
3 that NaN 64 128
4 that NaN 36 80
5 that NaN 152 112
6 something else 1.0 96 192
7 something else 2.0 54 120
8 something else 3.0 228 168
In [103]: mypiv = new.set_index(['level_0', 'scenario'])
In [104]: mypiv
Out[104]:
mylocation North South
level_0 scenario
this NaN 32 64
NaN 18 40
NaN 76 56
that NaN 64 128
NaN 36 80
NaN 152 112
something else 1.0 96 192
2.0 54 120
3.0 228 168
But I suspect there is a more elegant solution.