Adding grouped mean values to column in data frame

2019-02-25 02:21发布

This question already has an answer here:

I'd like to calculate group means in a data frame and create a new column in the original data frame containing those group mean values. (I'm doing a repeatability study and I want the mean value over measurements within an insertion, unit, and channel in a new column so I can subtract it off and calculate residuals.)

My data:

> head(mytestdata,15)
   Insertion Measurement Unit Channel Value
1          1           1   A5      10  9.41
2          1           1   A5      11  9.51
3          1           1   A5      12 10.59
4          1           1   A5      13  9.45
5          1           2   A5      10  9.42
6          1           2   A5      11  9.03
7          1           2   A5      12 10.62
8          1           2   A5      13  9.39
9          1           3   A5      10  9.38
10         1           3   A5      11  9.87
11         1           3   A5      12 11.34
12         1           3   A5      13  9.59
13         2           1   A5      10 12.10
14         2           1   A5      11 11.28
15         2           1   A5      12 12.95

Specifically, I want to calculate the mean Value per Insertion, Unit, and Channel, and add it to the data frame as meanValue. Then subtract meanValue from Value to get Residual.

Should look like this:

   Insertion Measurement Unit Channel Value meanValue
1          1           1   40      10 11.79     11.56
2          1           1   40      11 11.01     11.38
3          1           1   40      12 10.86     11.19
4          1           1   40      13 10.29     10.91
5          1           2   40      10 11.47     11.56
6          1           2   40      11 11.84     11.38
7          1           2   40      12 11.39     11.19
8          1           2   40      13 11.25     10.91
9          1           3   40      10 11.42     11.56
10         1           3   40      11 11.28     11.38
11         1           3   40      12 11.31     11.19
12         1           3   40      13 11.18     10.91
13         2           1   40      10 10.97     11.55
14         2           1   40      11 11.78     11.87
15         2           1   40      12 11.48     11.25

I know how to get the group means using by, aggregate, etc, which get me a second list or table with the values in it. I'm also confident I could get what I want using some convoluted looping procedures, but I'm looking to stuff them back in the same data frame in an elegant one- or two-line solution, and I figure there's got to be a way to do it but after days of searching I'm not finding it. I don't want a cumbersome solution because I want it to work well when I scale up to lots more data.

2条回答
趁早两清
2楼-- · 2019-02-25 02:52

Using data.table

library(data.table)
setDT(mytestdata)[, c("MeanValue", "Residual") := {m= mean(Value);list(m, Value-m)}, by=list(Insertion, Unit, Channel)]

mytestdata

#        Insertion Measurement Unit Channel Value MeanValue     Residual
#       1:         1           1   A5      10  9.41  9.403333  0.006666667
#       2:         1           1   A5      11  9.51  9.470000  0.040000000
#       3:         1           1   A5      12 10.59 10.850000 -0.260000000
#       4:         1           1   A5      13  9.45  9.476667 -0.026666667
#       5:         1           2   A5      10  9.42  9.403333  0.016666667
#       6:         1           2   A5      11  9.03  9.470000 -0.440000000
#       7:         1           2   A5      12 10.62 10.850000 -0.230000000
#       8:         1           2   A5      13  9.39  9.476667 -0.086666667
#       9:         1           3   A5      10  9.38  9.403333 -0.023333333
#      10:         1           3   A5      11  9.87  9.470000  0.400000000
#      11:         1           3   A5      12 11.34 10.850000  0.490000000
#      12:         1           3   A5      13  9.59  9.476667  0.113333333
#      13:         2           1   A5      10 12.10 12.100000  0.000000000
#      14:         2           1   A5      11 11.28 11.280000  0.000000000
#      15:         2           1   A5      12 12.95 12.950000  0.000000000
查看更多
可以哭但决不认输i
3楼-- · 2019-02-25 03:02

You can use ave to calculate the groupmeans:

df$MeanValue <- with(df, ave(Value, Insertion, Unit, Channel, FUN = mean))

Then calculate the residuals:

df$Residual <- df$Value - df$MeanValue
df
#   Insertion Measurement Unit Channel Value MeanValue     Residual
#1          1           1   A5      10  9.41  9.403333  0.006666667
#2          1           1   A5      11  9.51  9.470000  0.040000000
#3          1           1   A5      12 10.59 10.850000 -0.260000000
#4          1           1   A5      13  9.45  9.476667 -0.026666667
#5          1           2   A5      10  9.42  9.403333  0.016666667
#6          1           2   A5      11  9.03  9.470000 -0.440000000
#7          1           2   A5      12 10.62 10.850000 -0.230000000
#8          1           2   A5      13  9.39  9.476667 -0.086666667
#9          1           3   A5      10  9.38  9.403333 -0.023333333
#10         1           3   A5      11  9.87  9.470000  0.400000000
#11         1           3   A5      12 11.34 10.850000  0.490000000
#12         1           3   A5      13  9.59  9.476667  0.113333333
#13         2           1   A5      10 12.10 12.100000  0.000000000
#14         2           1   A5      11 11.28 11.280000  0.000000000
#15         2           1   A5      12 12.95 12.950000  0.000000000

Or you could use dplyr

library(dplyr)

df %>% group_by(Insertion, Unit, Channel) %>% mutate(MeanValue = mean(Value), Residual = Value - MeanValue)
查看更多
登录 后发表回答