I've a csv like this:
client1,client2,client3,client4,client5,client6,amount
,,,Comp1,,,4.475000
,,,Comp2,,,16.305584
,,,Comp3,,,4.050000
Comp2,Comp1,,Comp4,,,21.000000
,,,Comp4,,,30.000000
,Comp1,,Comp2,,,5.137500
,,,Comp3,,,52.650000
,,,Comp1,,,2.650000
Comp3,,,Comp3,,,29.000000
Comp5,,,Comp2,,,20.809000
Comp5,,,Comp2,,,15.100000
Comp5,,,Comp2,,,52.404000
After reading it into a pandas dataframe, df, I wanted to aggregate in two steps:
Step1:
First, I sum the amount:
client1 client2 client3 client4 client5 client6 amount
Comp1 7.125000
Comp2 16.305584
Comp3 56.700000
Comp4 30.000000
Comp1 Comp2 5.137500
Comp2 Comp1 Comp4 21.000000
Comp3 Comp3 29.000000
Comp5 Comp2 88.313000
Then, I want to aggregate by each client name such that if multiple clients are involved like in group 5, then 5.1375 must be split equally between Comp1 and Comp2. Tried it this way:
df.groupby(['client1','client2','client3','client4','client5','client6']).apply(lambda x: x['amount'].sum()/len(x) if x.any().nunique()>=1 else x['amount'].sum())
client1 client2 client3 client4 client5 client6 0
0 Comp1 3.562500
1 Comp2 16.305584
2 Comp3 28.350000
3 Comp4 30.000000
4 Comp1 Comp2 5.137500
5 Comp2 Comp1 Comp4 21.000000
6 Comp3 Comp3 29.000000
7 Comp5 Comp2 29.437667
Expected Output is:
Client Amount
Comp1 4.475+21/3+5.1375/2+2.65 = 16.69375
Comp2 16.305584+21/3+20.809/2+15.10/2+52.404/2 = 67.462084
Comp3 4.05+52.65+29 = 85.7
Comp4 21/3+30 = 37
Comp5 20.809/2+15.10/2+52.404/2 = 44.1565
I tried using sum(axis=0)
but of no use.