This question already has an answer here:
-
how to calculate mean/median per group in a dataframe in r [duplicate]
5 answers
-
Summarizing multiple columns with dplyr? [duplicate]
5 answers
I want to average the values by their IDs but not all ID's have the same number of values. How do I do this in R?
I have two columns ID and Value
ID Value
1000 0.51
1000 0.01
1001 0.81
1001 0.41
1001 0.62
1002 0.98
1002 0.12
1002 0.15
1003 0.12
... ...
You can try by()
:
> with(df, by(Value, ID, mean))
# ID: 1000
# [1] 0.26
# ------------------------------------------------------------
# ID: 1001
# [1] 0.6133333
# ------------------------------------------------------------
# ID: 1002
# [1] 0.4166667
# ------------------------------------------------------------
# ID: 1003
# [1] 0.12
or aggregate()
:
> aggregate( Value ~ ID, df, mean)
# ID Value
# 1 1000 0.2600000
# 2 1001 0.6133333
# 3 1002 0.4166667
# 4 1003 0.1200000
or using data.table
(if you need fast calculation on large data sets):
> library(data.table)
> setDT(df)[, mean(Value), by = ID]
# ID V1
# 1: 1000 0.2600000
# 2: 1001 0.6133333
# 3: 1002 0.4166667
# 4: 1003 0.1200000
data
df <- structure(list(ID = c(1000L, 1000L, 1001L, 1001L, 1001L, 1002L,
1002L, 1002L, 1003L), Value = c(0.51, 0.01, 0.81, 0.41, 0.62,
0.98, 0.12, 0.15, 0.12)), .Names = c("ID", "Value"),
class = "data.frame", row.names = c(NA, -9L))
You could use the package dplyr
and the function summarise_each
:
df=data.frame(ID=c(1000,1000,1001,1001,1001,1002,1002,1002,1003), Value=c(0.51,0.01,0.81,0.41,0.62,0.98,0.12,0.15,0.12))
library(dplyr)
newdf <- df %>% group_by(ID) %>% summarise_each(funs(mean))
which gives you:
ID Value
(dbl) (dbl)
1 1000 0.2600000
2 1001 0.6133333
3 1002 0.4166667
4 1003 0.1200000
If you deal with large datasets this should be the most efficient way of doing this task.
Using sqldf
:
library(sqldf)
sqldf("SELECT ID, avg(Value) Mean
FROM df
GROUP BY ID")
Output:
ID Mean
1 1000 0.2600000
2 1001 0.6133333
3 1002 0.4166667
4 1003 0.1200000
With dplyr
, instead of summarise_each
as Cleb pointed out, we can just use summarise
:
df %>% group_by(ID) %>% summarise(mean = mean(Value))
#or
summarise(group_by(df, ID), mean = mean(Value))
Output:
ID mean
(int) (dbl)
1 1000 0.2600000
2 1001 0.6133333
3 1002 0.4166667
4 1003 0.1200000