I want to calculate mean
(or any other summary statistics of length one, e.g. min
, max
, length
, sum
) of a numeric variable ("value") within each level of a grouping variable ("group").
The summary statistic should be assigned to a new variable which has the same length as the original data. That is, each row of the original data should have a value corresponding to the current group value - the data set should not be collapsed to one row per group. For example, consider group mean
:
Before
id group value
1 a 10
2 a 20
3 b 100
4 b 200
After
id group value grp.mean.values
1 a 10 15
2 a 20 15
3 b 100 150
4 b 200 150
Here is another option using base functions
aggregate
andmerge
:You can get "better" column names with
suffixes
:Have a look at the
ave
function. Something likeIf you want to use
ave
to calculate something else per group, you need to specifyFUN = your-desired-function
, e.g.FUN = min
:You may do this in
dplyr
usingmutate
:...or use
data.table
to assign the new column by reference (:=
):One option is to use
plyr
.ddply
expects adata.frame
(the first d) and returns adata.frame
(the second d). Other XXply functions work in a similar way; i.e.ldply
expects alist
and returns adata.frame
,dlply
does the opposite...and so on and so forth. The second argument is the grouping variable(s). The third argument is the function we want to compute for each group.