I have a pandas dataframe containing the following data:
matchID server court speed
1 1 A 100
1 2 D 200
1 3 D 300
1 4 A 100
1 1 A 120
1 2 A 250
1 3 D 110
1 4 D 100
2 1 A 100
2 2 D 200
2 3 D 300
2 4 A 100
2 1 A 120
2 2 A 250
2 3 D 110
2 4 D 100
I would like to add two new columns containing the mean based on two conditions. The column meanSpeedCourtA13
shall contain the mean speed of servers
1 and 3 where court = A
. This would bee (100 + 120) / 2 = 110
. The second column named meanSpeedCourtD13
shall contain the mean speed of servers
1 and 3 where court = D
. This would be (300 + 110) / 2 = 205
.
Please note that this should be done for each matchID
, hence, a groupby is also required. this means that solutions containing iloc()
cannot be used.
The resulting dataframe should look as follows:
matchID server court speed meanSpeedCourtA13 meanSpeedCourtD13
1 1 A 100 110 205
1 2 D 200 110 205
1 3 D 300 110 205
1 4 A 100 110 205
1 1 A 120 110 205
1 2 A 250 110 205
1 3 D 110 110 205
1 4 D 100 110 205
2 1 A 100 110 205
2 2 D 200 110 205
2 3 D 300 110 205
2 4 A 100 110 205
2 1 A 120 110 205
2 2 A 250 110 205
2 3 D 110 110 205
2 4 D 100 110 205
With
groupby
, we can still useloc
to select the intended parts that we want to replace but put the whole computation within a for loop fromdf.groupby("matchID")
.Specail thanks to @Dark to point it out that I was hard coding
groupby
.For
loc
, it can be used to select values based on information from 2 axes: rows and columns. By convention on the documentation, the sequence to put information is rows first and columns second. For example, indf.loc[df.matchID==id, "meanSpeedCourtD13"]
,df.matchID==id
is about selecting rows that havematchID
beingid
and that"meanSpeedCourtD13"
specifies a column we want to look into.Side notes about calculating mean:
subg
where(subg.server.isin([1,3]))
then filter out server not in [1 ,3].where(subg.court == "A")
further to do filtering on court.mean
to compute mean from speed.As an alternative, you can use
np.where
to assign values to each matchID in [1, 2]. This works only for binarymatchID
. It is roughly the same speed with thegroupby
method above tested on my computer. To save space, we only demonstrate with"meanSpeedCourtA13"
column.For
np.where(condition, x, y)
, it will return x if condition is met, y otherwise. See np.where for documentation.You can get the
mean
bygroupby
and assign the values by getting the item() i.eOk this got a bit more complicated. Normally I'd try something with transform but I'd be glad if someone had something better than the following:
Use
groupby
and send df to func wheredf.loc
is used, lastly usepd.concat
to glue the dataframe together again:Returns