For example, I have a pandas dataframe as follows:
col_1 col_2 col_3 col_4
a X 5 1
a Y 3 2
a Z 6 4
b X 7 8
b Y 4 3
b Z 6 5
And I want to, for each value in col_1, add the values in col_3 and col_4 (and many more columns) that correspond to X and Z from col_2 and create a new row with these values. So the output would be as below:
col_1 col_2 col_3 col_4
a X 5 1
a Y 3 2
a Z 6 4
a NEW 11 5
b X 7 8
b Y 4 3
b Z 6 5
b NEW 13 13
Also, there could be more values in col_1 that will need the same treatment, so I can't explicitly reference 'a' and 'b'. I attempted to use a combination of groupby('col_1') and apply(), but I couldn't get it to work. I'm close enough with the below, but I can't get it to put 'NEW' in col_2 and to keep the original value (a or b, etc.) in col_1.
df.append(df[(df['col_2'] == 'X') | (df['col_2'] == 'Z')].groupby('col_1').mean())
Thanks.
The following code does it:
The
apply
method of thegroupby
object calls the functionsum_group
that returns a dataframe. The dataframes are then concatenated into a single dataframe. Thesum_group
concatenates the incoming dataframe with an additional rowsum_row
that contain the reduced version of the dataframe according to the criteria you stated.If you can guarantee that
X
andZ
appear only once in a group, you can use agroupby
andpd.concat
operation: