How to group a python data frame by multilevel row

2019-06-26 02:43发布

问题:

I have the following multi-level data frame:

Year   2016                    2017                 
Quarter  3   4                 1                 2      
Month  Sep   Oct   Nov   Dec   Jan  Feb    Mar   Apr   May   Jun
A      0.16  0.95  0.92  0.45  0.30  0.35  0.95  0.88  0.18  0.10
B      0.88  0.67  0.07  0.70  0.74  0.33  0.77  0.21  0.81  0.85
C      0.79  0.56  0.13  0.19  0.94  0.23  0.72  0.62  0.66  0.93

I want to sum up over the quarters, so that the final result is as follows:

Year     2016        2017   
Quarter  3     4     1     2
A        0.16  2.32  1.60  1.16
B        0.88  1.44  1.85  1.86
C        0.79  0.89  1.89  2.21

I tried with the following formula:

df= df.groupby('Quarter').transform('sum')

but I get this error:

KeyError: 'Quarter'

Clearly that's the wrong way to approach it. Could anyone please a solution or to approach finding one.

Additional information

The output of the df.index command is: Index([u'A', u'B',u'C'],dtype='object', name=u'DF name')

Thanks!

回答1:

Just using sum

df.sum(level=[0,1],axis=1)
Out[14]: 
year    2016        2017      
quater     3     4     1     2
A       0.16  2.32  1.60  1.16
B       0.88  1.44  1.84  1.87
C       0.79  0.88  1.89  2.21


回答2:

When you use groupby in pandas you group data based on columns data. But you have your groups in rows. All you need is to transpose your df before grouping and after.

Here is code you need:

First let's create df like yours:

import pandas as pd

index = pd.MultiIndex.from_tuples([(2016, 3, 'Sep', 'A'),
                                   (2016, 3, 'Sep', 'B'),
                                   (2016, 3, 'Sep', 'C'),
                                   (2016, 4, 'Oct', 'A'),
                                   (2016, 4, 'Oct', 'B'),
                                   (2016, 4, 'Oct', 'C'),
                                   (2016, 4, 'Nov', 'A'),
                                   (2016, 4, 'Nov', 'B'),
                                   (2016, 4, 'Nov', 'C'),
                                   (2017, 1, 'Jan', 'A'),
                                   (2017, 1, 'Jan', 'B'),
                                   (2017, 1, 'Jan', 'C'),
                                   (2017, 1, 'Feb', 'A'),
                                   (2017, 1, 'Feb', 'B'),
                                   (2017, 1, 'Feb', 'C'),
                                   ], names=['Year', 'Quarter', 'Month', 'Group'])

raw_df = pd.Series(range(15), index=index)

df = raw_df.unstack([0,1,2])
print(df)

Output:

Year    2016         2017    
Quarter    3   4        1    
Month    Sep Oct Nov  Jan Feb
Group                        
A          0   3   6    9  12
B          1   4   7   10  13
C          2   5   8   11  14

Looks exactly like in your example. Now you need only 1 line of code:

new_df = df.transpose().groupby(['Year', 'Quarter']).sum().transpose()
print(new_df)

Here's your output:

Year    2016     2017
Quarter    3   4    1
Group                
A          0   9   21
B          1  11   23
C          2  13   25

Good luck!



回答3:

df.sum(level=['Year', 'Quater'], axis=1)

EDIT: thanks Matt Messersmith for note on transpose

Reproducing full example:

import pandas as pd

tuples = [(2016, 3, 'Sep'), (2016, 4, 'Oct'), (2016, 4, 'Nov'),
          (2016, 4, 'Dec'), (2017, 1, 'Jan'), (2017, 1, 'Feb'),
          (2017, 1, 'Mar'), (2017, 2, 'Apr'), (2017, 2, 'May'),
          (2017, 2, 'Jun')]

index = pd.MultiIndex.from_tuples(tuples, names=['Year', 'Quater', 'Month'])

df = pd.DataFrame([(0.16, 0.88, 0.79), (0.95, 0.67, 0.56), (0.92, 0.07, 0.13), 
                   (0.45, 0.70, 0.19), (0.30, 0.74, 0.94), (0.35, 0.33, 0.23), 
                   (0.95, 0.77, 0.72), (0.88, 0.21, 0.62), (0.18, 0.81, 0.66), 
                   (0.10, 0.85, 0.93)], 
                  index=index, 
                  columns=["A", "B", "C"])
df = df.T
print(df.sum(level=['Year', 'Quater'], axis=1))


回答4:

You can try transpose the data then sum it by the index you want and transpose it back the way you want

df.transpose().sum(level=[0,1]).transpose()

In case you needed to pivot your results.