Getting max values from pandas multiindex datafram

2019-05-12 08:59发布

问题:

Im trying to retrieve only the max values (including the multi index values) from a pandas dataframe that has multiple indexes. The dataframe I have is generated via a groupby and column selection ('tOfmAJyI') like this:

df.groupby('id')['tOfmAJyI'].value_counts()

Out[4]: 
id     tOfmAJyI
3      mlNXN       4
       SSvEP       2
       hCIpw       2
5      SSvEP       2
       hCIpw       1
       mlNXN       1
11     mlNXN       2
       SSvEP       1
...

What I would like to achieve is to get the max values including their corresponding index values. So something like:

id     tOfmAJyI
3      mlNXN       4
5      SSvEP       2
11     mlNXN       2
...

Any ideas how I can achieve this? I was able to get the id and max value but I'm still trying to get the corresponding value of 'tOfmAJyI'.

回答1:

groupby + head

df.groupby(level=0).head(1)
Out[1882]: 
id  tOfmAJyI
3   mlNXN       4
5   SSvEP       2
11  mlNXN       2
Name: V, dtype: int64

Or

df.loc[df.groupby(level=0).idxmax()]
Out[1888]: 
id  tOfmAJyI
3   mlNXN       4
5   SSvEP       2
11  mlNXN       2
Name: V, dtype: int64


回答2:

I had the same problem with my code, instead of using max, sort values by 'ascending = False', then use groupby(level=0).head(1). I have provided the code that worked for me and then a suggestion for your code.

table = pd.pivot_table(df, index= ['Site', 'DayofWeek'], values= ['CTR'])

table = table.sort_values(by = 'CTR', ascending = False)

table.groupby(level=0).head(1)

I first used loc and .apply(max) or idxmax(), however an error occured: 'Indexing a MultiIndex with a DataFrame key is not implemented'. So to avoid this use the suggested method

Your code-

table = df.groupby('id')['tOfmAJyI'].value_counts()

table = table.sort_values(by = 'tOfmAJyI', ascending = False)
table.groupby(level=0).head(1)