I have the following multi-level data frame:
Year 2016 2017
Quarter 3 4 1 2
Month Sep Oct Nov Dec Jan Feb Mar Apr May Jun
A 0.16 0.95 0.92 0.45 0.30 0.35 0.95 0.88 0.18 0.10
B 0.88 0.67 0.07 0.70 0.74 0.33 0.77 0.21 0.81 0.85
C 0.79 0.56 0.13 0.19 0.94 0.23 0.72 0.62 0.66 0.93
I want to sum up over the quarters, so that the final result is as follows:
Year 2016 2017
Quarter 3 4 1 2
A 0.16 2.32 1.60 1.16
B 0.88 1.44 1.85 1.86
C 0.79 0.89 1.89 2.21
I tried with the following formula:
df= df.groupby('Quarter').transform('sum')
but I get this error:
KeyError: 'Quarter'
Clearly that's the wrong way to approach it. Could anyone please a solution or to approach finding one.
Additional information
The output of the df.index
command is: Index([u'A', u'B',u'C'],dtype='object', name=u'DF name')
Thanks!
Just using sum
df.sum(level=[0,1],axis=1)
Out[14]:
year 2016 2017
quater 3 4 1 2
A 0.16 2.32 1.60 1.16
B 0.88 1.44 1.84 1.87
C 0.79 0.88 1.89 2.21
When you use groupby in pandas you group data based on columns data. But you have your groups in rows. All you need is to transpose your df before grouping and after.
Here is code you need:
First let's create df like yours:
import pandas as pd
index = pd.MultiIndex.from_tuples([(2016, 3, 'Sep', 'A'),
(2016, 3, 'Sep', 'B'),
(2016, 3, 'Sep', 'C'),
(2016, 4, 'Oct', 'A'),
(2016, 4, 'Oct', 'B'),
(2016, 4, 'Oct', 'C'),
(2016, 4, 'Nov', 'A'),
(2016, 4, 'Nov', 'B'),
(2016, 4, 'Nov', 'C'),
(2017, 1, 'Jan', 'A'),
(2017, 1, 'Jan', 'B'),
(2017, 1, 'Jan', 'C'),
(2017, 1, 'Feb', 'A'),
(2017, 1, 'Feb', 'B'),
(2017, 1, 'Feb', 'C'),
], names=['Year', 'Quarter', 'Month', 'Group'])
raw_df = pd.Series(range(15), index=index)
df = raw_df.unstack([0,1,2])
print(df)
Output:
Year 2016 2017
Quarter 3 4 1
Month Sep Oct Nov Jan Feb
Group
A 0 3 6 9 12
B 1 4 7 10 13
C 2 5 8 11 14
Looks exactly like in your example. Now you need only 1 line of code:
new_df = df.transpose().groupby(['Year', 'Quarter']).sum().transpose()
print(new_df)
Here's your output:
Year 2016 2017
Quarter 3 4 1
Group
A 0 9 21
B 1 11 23
C 2 13 25
Good luck!
df.sum(level=['Year', 'Quater'], axis=1)
EDIT: thanks Matt Messersmith for note on transpose
Reproducing full example:
import pandas as pd
tuples = [(2016, 3, 'Sep'), (2016, 4, 'Oct'), (2016, 4, 'Nov'),
(2016, 4, 'Dec'), (2017, 1, 'Jan'), (2017, 1, 'Feb'),
(2017, 1, 'Mar'), (2017, 2, 'Apr'), (2017, 2, 'May'),
(2017, 2, 'Jun')]
index = pd.MultiIndex.from_tuples(tuples, names=['Year', 'Quater', 'Month'])
df = pd.DataFrame([(0.16, 0.88, 0.79), (0.95, 0.67, 0.56), (0.92, 0.07, 0.13),
(0.45, 0.70, 0.19), (0.30, 0.74, 0.94), (0.35, 0.33, 0.23),
(0.95, 0.77, 0.72), (0.88, 0.21, 0.62), (0.18, 0.81, 0.66),
(0.10, 0.85, 0.93)],
index=index,
columns=["A", "B", "C"])
df = df.T
print(df.sum(level=['Year', 'Quater'], axis=1))
You can try transpose the data then sum
it by the index you want and transpose it back the way you want
df.transpose().sum(level=[0,1]).transpose()
In case you needed to pivot your results.