How to group a python data frame by multilevel row

I have the following multi-level data frame:

Year   2016                    2017                 
Quarter  3   4                 1                 2      
Month  Sep   Oct   Nov   Dec   Jan  Feb    Mar   Apr   May   Jun
A      0.16  0.95  0.92  0.45  0.30  0.35  0.95  0.88  0.18  0.10
B      0.88  0.67  0.07  0.70  0.74  0.33  0.77  0.21  0.81  0.85
C      0.79  0.56  0.13  0.19  0.94  0.23  0.72  0.62  0.66  0.93

I want to sum up over the quarters, so that the final result is as follows:

Year     2016        2017   
Quarter  3     4     1     2
A        0.16  2.32  1.60  1.16
B        0.88  1.44  1.85  1.86
C        0.79  0.89  1.89  2.21

I tried with the following formula:

df= df.groupby('Quarter').transform('sum')

but I get this error:

KeyError: 'Quarter'

Clearly that's the wrong way to approach it. Could anyone please a solution or to approach finding one.

Additional information

The output of the df.index command is: Index([u'A', u'B',u'C'],dtype='object', name=u'DF name')

Thanks!

标签： python pandas pandas-groupby

4条回答

混吃等死

2楼-- · 2019-06-26 03:24

When you use groupby in pandas you group data based on columns data. But you have your groups in rows. All you need is to transpose your df before grouping and after.

Here is code you need:

First let's create df like yours:

import pandas as pd

index = pd.MultiIndex.from_tuples([(2016, 3, 'Sep', 'A'),
                                   (2016, 3, 'Sep', 'B'),
                                   (2016, 3, 'Sep', 'C'),
                                   (2016, 4, 'Oct', 'A'),
                                   (2016, 4, 'Oct', 'B'),
                                   (2016, 4, 'Oct', 'C'),
                                   (2016, 4, 'Nov', 'A'),
                                   (2016, 4, 'Nov', 'B'),
                                   (2016, 4, 'Nov', 'C'),
                                   (2017, 1, 'Jan', 'A'),
                                   (2017, 1, 'Jan', 'B'),
                                   (2017, 1, 'Jan', 'C'),
                                   (2017, 1, 'Feb', 'A'),
                                   (2017, 1, 'Feb', 'B'),
                                   (2017, 1, 'Feb', 'C'),
                                   ], names=['Year', 'Quarter', 'Month', 'Group'])

raw_df = pd.Series(range(15), index=index)

df = raw_df.unstack([0,1,2])
print(df)

Output:

Year    2016         2017    
Quarter    3   4        1    
Month    Sep Oct Nov  Jan Feb
Group                        
A          0   3   6    9  12
B          1   4   7   10  13
C          2   5   8   11  14

Looks exactly like in your example. Now you need only 1 line of code:

new_df = df.transpose().groupby(['Year', 'Quarter']).sum().transpose()
print(new_df)

Here's your output:

Year    2016     2017
Quarter    3   4    1
Group                
A          0   9   21
B          1  11   23
C          2  13   25

Good luck!

0人赞添加讨论(0) 举报

▲ chillily

3楼-- · 2019-06-26 03:25

You can try transpose the data then sum it by the index you want and transpose it back the way you want

df.transpose().sum(level=[0,1]).transpose()

In case you needed to pivot your results.

0人赞添加讨论(0) 举报

【Aperson】

4楼-- · 2019-06-26 03:27

df.sum(level=['Year', 'Quater'], axis=1)

EDIT: thanks Matt Messersmith for note on transpose

Reproducing full example:

import pandas as pd

tuples = [(2016, 3, 'Sep'), (2016, 4, 'Oct'), (2016, 4, 'Nov'),
          (2016, 4, 'Dec'), (2017, 1, 'Jan'), (2017, 1, 'Feb'),
          (2017, 1, 'Mar'), (2017, 2, 'Apr'), (2017, 2, 'May'),
          (2017, 2, 'Jun')]

index = pd.MultiIndex.from_tuples(tuples, names=['Year', 'Quater', 'Month'])

df = pd.DataFrame([(0.16, 0.88, 0.79), (0.95, 0.67, 0.56), (0.92, 0.07, 0.13), 
                   (0.45, 0.70, 0.19), (0.30, 0.74, 0.94), (0.35, 0.33, 0.23), 
                   (0.95, 0.77, 0.72), (0.88, 0.21, 0.62), (0.18, 0.81, 0.66), 
                   (0.10, 0.85, 0.93)], 
                  index=index, 
                  columns=["A", "B", "C"])
df = df.T
print(df.sum(level=['Year', 'Quater'], axis=1))

0人赞添加讨论(0) 举报

啃猪蹄的小仙女

5楼-- · 2019-06-26 03:34

Just using sum

df.sum(level=[0,1],axis=1)
Out[14]: 
year    2016        2017      
quater     3     4     1     2
A       0.16  2.32  1.60  1.16
B       0.88  1.44  1.84  1.87
C       0.79  0.88  1.89  2.21

0人赞添加讨论(0) 举报

How to group a python data frame by multilevel row

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间