Python (Pandas) Add subtotal on each lvl of multii

2019-01-17 13:26发布

问题:

Assuming I have the following dataframe:

a       b       c      Sce1     Sce2    Sce3    Sce4    Sce5    Sc6
Animal  Ground  Dog    0.0      0.9     0.5     0.0     0.3     0.4  
Animal  Ground  Cat    0.6      0.5     0.3     0.5     1.0     0.2 
Animal  Air     Eagle  1.0      0.1     0.1     0.6     0.9     0.1 
Animal  Air     Owl    0.3      0.1     0.5     0.3     0.5     0.9     
Object  Metal   Car    0.3      0.3     0.8     0.6     0.5     0.6 
Object  Metal   Bike   0.5      0.1     0.4     0.7     0.4     0.2 
Object  Wood    Chair  0.9      0.6     0.1     0.9     0.2     0.8 
Object  Wood    Table  0.9      0.6     0.6     0.1     0.9     0.7 

I want to create a MultiIndex, which will contain the sum of each lvl. The output will look like this:

a      b      c     Sce1    Sce2    Sce3    Sce4    Sce5    Sce6
Animal              1.9     1.6     1.4     1.3     2.7     1.6 
       Ground       0.6     1.4     0.8     0.5     1.3     0.6 
              Dog   0.0     0.9     0.5     0.0     0.3     0.4 
              Cat   0.6     0.5     0.3     0.5     1.0     0.2 
       Air          1.3     0.2     0.7     0.8     1.4     1.0 
              Eagle 1.0     0.1     0.1     0.6     0.9     0.1 
              Owl   0.3     0.1     0.5     0.3     0.5     0.9 
Object              2.6     1.6     1.8     2.3     2.0     2.3 
       Metal        0.8     0.3     1.1     1.3     0.9     0.8 
              Car   0.3     0.3     0.8     0.6     0.5     0.6 
              Bike  0.5     0.1     0.4     0.7     0.4     0.2 
       Wood         1.8     1.3     0.6     1.0     1.1     1.5 
              Chair 0.9     0.6     0.1     0.9     0.2     0.8 
              Table 0.9     0.6     0.6     0.1     0.9     0.7 

At the moment I am using a loop to create three different dataframes on each level and then manipulate them on excel, as below. So I wanted to take this calculation in python if possible.

for i in range range(0,3):
    df = df.groupby(list(df.columns)[0:lvl], as_index=False).sum()
    return df

Many thanks in advance.

回答1:

With some liberal use of MAGIC

pd.concat([
        df.assign(
            **{x: 'Total' for x in 'abc'[i:]}
        ).groupby(list('abc')).sum() for i in range(4)
    ]).sort_index()

                     Sce1  Sce2  Sce3  Sce4  Sce5  Sc6
a      b      c                                       
Animal Air    Eagle   1.0   0.1   0.1   0.6   0.9  0.1
              Owl     0.3   0.1   0.5   0.3   0.5  0.9
              Total   1.3   0.2   0.6   0.9   1.4  1.0
       Ground Cat     0.6   0.5   0.3   0.5   1.0  0.2
              Dog     0.0   0.9   0.5   0.0   0.3  0.4
              Total   0.6   1.4   0.8   0.5   1.3  0.6
       Total  Total   1.9   1.6   1.4   1.4   2.7  1.6
Object Metal  Bike    0.5   0.1   0.4   0.7   0.4  0.2
              Car     0.3   0.3   0.8   0.6   0.5  0.6
              Total   0.8   0.4   1.2   1.3   0.9  0.8
       Total  Total   2.6   1.6   1.9   2.3   2.0  2.3
       Wood   Chair   0.9   0.6   0.1   0.9   0.2  0.8
              Table   0.9   0.6   0.6   0.1   0.9  0.7
              Total   1.8   1.2   0.7   1.0   1.1  1.5
Total  Total  Total   4.5   3.2   3.3   3.7   4.7  3.9

I can get exactly what you asked for with

pd.concat([
        df.assign(
            **{x: '' for x in 'abc'[i:]}
        ).groupby(list('abc')).sum() for i in range(1, 4)
    ]).sort_index()

                     Sce1  Sce2  Sce3  Sce4  Sce5  Sc6
a      b      c                                       
Animal                1.9   1.6   1.4   1.4   2.7  1.6
       Air            1.3   0.2   0.6   0.9   1.4  1.0
              Eagle   1.0   0.1   0.1   0.6   0.9  0.1
              Owl     0.3   0.1   0.5   0.3   0.5  0.9
       Ground         0.6   1.4   0.8   0.5   1.3  0.6
              Cat     0.6   0.5   0.3   0.5   1.0  0.2
              Dog     0.0   0.9   0.5   0.0   0.3  0.4
Object                2.6   1.6   1.9   2.3   2.0  2.3
       Metal          0.8   0.4   1.2   1.3   0.9  0.8
              Bike    0.5   0.1   0.4   0.7   0.4  0.2
              Car     0.3   0.3   0.8   0.6   0.5  0.6
       Wood           1.8   1.2   0.7   1.0   1.1  1.5
              Chair   0.9   0.6   0.1   0.9   0.2  0.8
              Table   0.9   0.6   0.6   0.1   0.9  0.7

As for the how! I'll leave that as an exercise for the reader.



回答2:

you need to do two group by to get subtotals at every level of aggregation. Then add those back in to the initial DF. Here's a related question.