Adding grouped mean values to column in data frame

This question already has an answer here:

Calculating statistics on subsets of data [duplicate] 3 answers

I'd like to calculate group means in a data frame and create a new column in the original data frame containing those group mean values. (I'm doing a repeatability study and I want the mean value over measurements within an insertion, unit, and channel in a new column so I can subtract it off and calculate residuals.)

My data:

> head(mytestdata,15)
   Insertion Measurement Unit Channel Value
1          1           1   A5      10  9.41
2          1           1   A5      11  9.51
3          1           1   A5      12 10.59
4          1           1   A5      13  9.45
5          1           2   A5      10  9.42
6          1           2   A5      11  9.03
7          1           2   A5      12 10.62
8          1           2   A5      13  9.39
9          1           3   A5      10  9.38
10         1           3   A5      11  9.87
11         1           3   A5      12 11.34
12         1           3   A5      13  9.59
13         2           1   A5      10 12.10
14         2           1   A5      11 11.28
15         2           1   A5      12 12.95

Specifically, I want to calculate the mean Value per Insertion, Unit, and Channel, and add it to the data frame as meanValue. Then subtract meanValue from Value to get Residual.

Should look like this:

   Insertion Measurement Unit Channel Value meanValue
1          1           1   40      10 11.79     11.56
2          1           1   40      11 11.01     11.38
3          1           1   40      12 10.86     11.19
4          1           1   40      13 10.29     10.91
5          1           2   40      10 11.47     11.56
6          1           2   40      11 11.84     11.38
7          1           2   40      12 11.39     11.19
8          1           2   40      13 11.25     10.91
9          1           3   40      10 11.42     11.56
10         1           3   40      11 11.28     11.38
11         1           3   40      12 11.31     11.19
12         1           3   40      13 11.18     10.91
13         2           1   40      10 10.97     11.55
14         2           1   40      11 11.78     11.87
15         2           1   40      12 11.48     11.25

I know how to get the group means using by, aggregate, etc, which get me a second list or table with the values in it. I'm also confident I could get what I want using some convoluted looping procedures, but I'm looking to stuff them back in the same data frame in an elegant one- or two-line solution, and I figure there's got to be a way to do it but after days of searching I'm not finding it. I don't want a cumbersome solution because I want it to work well when I scale up to lots more data.

标签： r dataframe aggregate

2条回答

趁早两清

2楼-- · 2019-02-25 02:52

Using data.table

library(data.table)
setDT(mytestdata)[, c("MeanValue", "Residual") := {m= mean(Value);list(m, Value-m)}, by=list(Insertion, Unit, Channel)]

mytestdata

#        Insertion Measurement Unit Channel Value MeanValue     Residual
#       1:         1           1   A5      10  9.41  9.403333  0.006666667
#       2:         1           1   A5      11  9.51  9.470000  0.040000000
#       3:         1           1   A5      12 10.59 10.850000 -0.260000000
#       4:         1           1   A5      13  9.45  9.476667 -0.026666667
#       5:         1           2   A5      10  9.42  9.403333  0.016666667
#       6:         1           2   A5      11  9.03  9.470000 -0.440000000
#       7:         1           2   A5      12 10.62 10.850000 -0.230000000
#       8:         1           2   A5      13  9.39  9.476667 -0.086666667
#       9:         1           3   A5      10  9.38  9.403333 -0.023333333
#      10:         1           3   A5      11  9.87  9.470000  0.400000000
#      11:         1           3   A5      12 11.34 10.850000  0.490000000
#      12:         1           3   A5      13  9.59  9.476667  0.113333333
#      13:         2           1   A5      10 12.10 12.100000  0.000000000
#      14:         2           1   A5      11 11.28 11.280000  0.000000000
#      15:         2           1   A5      12 12.95 12.950000  0.000000000

0人赞添加讨论(0) 举报

可以哭但决不认输i

3楼-- · 2019-02-25 03:02

You can use ave to calculate the groupmeans:

df$MeanValue <- with(df, ave(Value, Insertion, Unit, Channel, FUN = mean))

Then calculate the residuals:

df$Residual <- df$Value - df$MeanValue
df
#   Insertion Measurement Unit Channel Value MeanValue     Residual
#1          1           1   A5      10  9.41  9.403333  0.006666667
#2          1           1   A5      11  9.51  9.470000  0.040000000
#3          1           1   A5      12 10.59 10.850000 -0.260000000
#4          1           1   A5      13  9.45  9.476667 -0.026666667
#5          1           2   A5      10  9.42  9.403333  0.016666667
#6          1           2   A5      11  9.03  9.470000 -0.440000000
#7          1           2   A5      12 10.62 10.850000 -0.230000000
#8          1           2   A5      13  9.39  9.476667 -0.086666667
#9          1           3   A5      10  9.38  9.403333 -0.023333333
#10         1           3   A5      11  9.87  9.470000  0.400000000
#11         1           3   A5      12 11.34 10.850000  0.490000000
#12         1           3   A5      13  9.59  9.476667  0.113333333
#13         2           1   A5      10 12.10 12.100000  0.000000000
#14         2           1   A5      11 11.28 11.280000  0.000000000
#15         2           1   A5      12 12.95 12.950000  0.000000000

Or you could use dplyr

library(dplyr)

df %>% group_by(Insertion, Unit, Channel) %>% mutate(MeanValue = mean(Value), Residual = Value - MeanValue)

0人赞添加讨论(0) 举报

Adding grouped mean values to column in data frame

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间