Insert rows as a result of a groupby operation int

2019-05-09 23:18发布

For example, I have a pandas dataframe as follows:

col_1   col_2   col_3  col_4
a       X        5      1
a       Y        3      2
a       Z        6      4
b       X        7      8
b       Y        4      3
b       Z        6      5

And I want to, for each value in col_1, add the values in col_3 and col_4 (and many more columns) that correspond to X and Z from col_2 and create a new row with these values. So the output would be as below:

col_1   col_2   col_3  col_4 
a       X        5      1
a       Y        3      2
a       Z        6      4
a       NEW      11     5
b       X        7      8
b       Y        4      3
b       Z        6      5
b       NEW      13     13

Also, there could be more values in col_1 that will need the same treatment, so I can't explicitly reference 'a' and 'b'. I attempted to use a combination of groupby('col_1') and apply(), but I couldn't get it to work. I'm close enough with the below, but I can't get it to put 'NEW' in col_2 and to keep the original value (a or b, etc.) in col_1.

df.append(df[(df['col_2'] == 'X') | (df['col_2'] == 'Z')].groupby('col_1').mean())

Thanks.

2条回答
欢心
2楼-- · 2019-05-10 00:00

The following code does it:

import pandas as pd

def sum_group(df):
  dfxz = df[df.col_2.isin(['X','Z'])]
  sum_row = pd.Series(
    [
      df.col_1.iloc[0],
      'NEW',
      dfxz.col_3.sum(),
      dfxz.col_4.sum()
    ], index = dfxz.columns)
  return df.append(sum_row, ignore_index=True)

df = pd.DataFrame([['a', 'X', 5, 1],
                   ['a', 'Y', 3, 2],
                   ['a', 'Z', 6, 4],
                   ['b', 'X', 7, 8],
                   ['b', 'Y', 4, 3],
                   ['b', 'Z', 6, 5]],
                  columns = ['col_1','col_2','col_3','col_4'])

df = df.groupby('col_1').apply(
  sum_group,
  ).reset_index(drop=True)

print df

The apply method of the groupby object calls the function sum_group that returns a dataframe. The dataframes are then concatenated into a single dataframe. The sum_group concatenates the incoming dataframe with an additional row sum_row that contain the reduced version of the dataframe according to the criteria you stated.

查看更多
走好不送
3楼-- · 2019-05-10 00:08

If you can guarantee that X and Z appear only once in a group, you can use a groupby and pd.concat operation:

new = df[df.col_2.isin(['X', 'Z'])]\
      .groupby(['col_1'], as_index=False).sum()\
      .assign(col_2='NEW')

df = pd.concat([df, new]).sort_values('col_1')

df
  col_1 col_2  col_3  col_4
0     a     X      5      1
1     a     Y      3      2
2     a     Z      6      4
0     a   NEW     11      5
3     b     X      7      8
4     b     Y      4      3
5     b     Z      6      5
1     b   NEW     13     13
查看更多
登录 后发表回答