This question already has an answer here:
I have a data.frame
and I need to calculate the mean per group (i.e. per Month
, below).
Name Month Rate1 Rate2
Aira 1 12 23
Aira 2 18 73
Aira 3 19 45
Ben 1 53 19
Ben 2 22 87
Ben 3 19 45
Cat 1 22 87
Cat 2 67 43
Cat 3 45 32
My desired output is like below, where the values for Rate1
and Rate2
are the group means. Please disregard the value, I have made it up for the example.
Name Rate1 Rate2
Aira 23.21 12.2
Ben 45.23 43.9
Cat 33.22 32.2
You can also use package
plyr
, which is somehow more versatile:Or use
group_by
&summarise_at
from thedplyr
package:See
?summarise_at
for the many ways to specify the variables to act on. Here,vars(-Month)
says all variables exceptMonth
.Here are a variety of ways to do this in base
R
including an alternativeaggregate
approach. The examples below return means per month, which I think is what you requested. Although, the same approach could be used to return means per person:Using
ave
:Using
by
:Using
lapply
andsplit
:Using
sapply
andsplit
:Using
aggregate
:You could also use the generic function
cbind()
andlm()
without the intercept:You can also accomplish this using the
sqldf
package as shown below:I am a recent convert to
dplyr
as shown in other answers, butsqldf
is nice as most data analysts/data scientists/developers have at least some fluency in SQL. In this way, I think it tends to make for more universally readable code thandplyr
or other solutions presented above.UPDATE: In responding to the comment below, I attempted to update the code as shown above. However, the behavior was not as I expected. It seems that the column definition (i.e.
int
vsfloat
) is only carried through when the column alias matches the original column name. When you specify a new name, the aggregate column is returned without rounding.I describe two ways to do this, one based on data.table and the other based on reshape2 package . The data.table way already has an answer, but I have tried to make it cleaner and more detailed.
The data is like this:
Using data.table:
There is another way of doing it by avoiding to write many argument for j in data.table using a .SD
if we only want to have Rate1 and Rate2 then we can use the .SDcols as follows: