This question already has an answer here:
I'd like to calculate group means in a data frame and create a new column in the original data frame containing those group mean values. (I'm doing a repeatability study and I want the mean value over measurements within an insertion, unit, and channel in a new column so I can subtract it off and calculate residuals.)
My data:
> head(mytestdata,15)
Insertion Measurement Unit Channel Value
1 1 1 A5 10 9.41
2 1 1 A5 11 9.51
3 1 1 A5 12 10.59
4 1 1 A5 13 9.45
5 1 2 A5 10 9.42
6 1 2 A5 11 9.03
7 1 2 A5 12 10.62
8 1 2 A5 13 9.39
9 1 3 A5 10 9.38
10 1 3 A5 11 9.87
11 1 3 A5 12 11.34
12 1 3 A5 13 9.59
13 2 1 A5 10 12.10
14 2 1 A5 11 11.28
15 2 1 A5 12 12.95
Specifically, I want to calculate the mean Value per Insertion, Unit, and Channel, and add it to the data frame as meanValue. Then subtract meanValue from Value to get Residual.
Should look like this:
Insertion Measurement Unit Channel Value meanValue
1 1 1 40 10 11.79 11.56
2 1 1 40 11 11.01 11.38
3 1 1 40 12 10.86 11.19
4 1 1 40 13 10.29 10.91
5 1 2 40 10 11.47 11.56
6 1 2 40 11 11.84 11.38
7 1 2 40 12 11.39 11.19
8 1 2 40 13 11.25 10.91
9 1 3 40 10 11.42 11.56
10 1 3 40 11 11.28 11.38
11 1 3 40 12 11.31 11.19
12 1 3 40 13 11.18 10.91
13 2 1 40 10 10.97 11.55
14 2 1 40 11 11.78 11.87
15 2 1 40 12 11.48 11.25
I know how to get the group means using by, aggregate, etc, which get me a second list or table with the values in it. I'm also confident I could get what I want using some convoluted looping procedures, but I'm looking to stuff them back in the same data frame in an elegant one- or two-line solution, and I figure there's got to be a way to do it but after days of searching I'm not finding it. I don't want a cumbersome solution because I want it to work well when I scale up to lots more data.
Using
data.table
You can use
ave
to calculate the groupmeans:Then calculate the residuals:
Or you could use
dplyr