I have a simple data.frame
> df <- data.frame(a=c(3,5,7), b=c(5,3,7), c=c(5,6,4))
> df
a b c
1 3 5 5
2 5 3 6
3 7 7 4
Is there a simple and efficient way to get a new data.frame with the same number of rows but with the mean of, for example, column a and b by row? something like this:
mean.of.a.and.b c
1 4 5
2 4 6
3 7 4
Use rowMeans()
on the first two columns only. Then cbind()
to the third column.
cbind(mean.of.a.and.b = rowMeans(df[-3]), df[3])
# mean.of.a.and.b c
# 1 4 5
# 2 4 6
# 3 7 4
Note: If you have any NA values in your original data, you may want to use na.rm = TRUE
in rowMeans()
. See ?rowMeans
for more.
Another option using the dplyr
package:
library("dplyr")
df %>%
rowwise()%>%
mutate(mean.of.a.and.b = mean(c(a, b))) %>%
## Then if you want to remove a and b:
select(-a, -b)
I think the best option is using rowMeans()
posted by Richard Scriven. rowMeans and rowSums are equivalent to use of apply with FUN = mean or FUN = sum but a lot faster. I post the version with apply just for reference, in case we would like to pass another function.
data.frame(mean.of.a.and.b = apply(df[-3], 1, mean), c = df[3])
Output:
mean.of.a.and.b c
1 4 5
2 4 6
3 7 4
Very verbose using SQL with sqldf
library(sqldf
sqldf("SELECT (sum(a)+sum(b))/(count(a)+count(b)) as mean, c
FROM df group by c")
Output:
mean c
1 7 4
2 4 5
3 4 6