How to calculate the mean of specific rows in R?

2019-03-06 10:22发布

问题:

I have a data file as following example but much more larger

names    num    Y1  Y2
William  1  4.71    7.4
William  2  3.75    8
William  3  4.71    7.9
Katja    1  5.83    8.5
Katja    2  5.17    7.1
Katja    3  6.08    7.4
Aroma    1  4.04    7.5
Aroma    2  5       6.9
Aroma    3  4.3     7.9
...

I have to calculate the mean for each 3 of the same names (first column) for Y1 and Y2. And then make a bar chart by the average of each name with Y1 and Y2, separately. So on the x axis I will have the names and on the y axis the mean. Could anybody help me with this?

回答1:

You can also use aggregate. See ?aggregate for further details.

> aggregate(.~names, FUN=mean, data=df[, -2])
    names       Y1       Y2
1   Aroma 4.446667 7.433333
2   Katja 5.693333 7.666667
3 William 4.390000 7.766667

Take a look at this post for another alternatives of taking mean for each group.

For the bar plots use R base barplot function although there other alternatives such as ggplot2 graphics.

barplot(DF[,2], names.arg=DF$names, ylab="mean of Y1", las=1) # for Y1
barplot(DF[,3], names.arg=DF$names, ylab="mean of Y2", las=1) # for Y2

which produce:

As you are very new to R, I recommend to read An introduction to R which is a good starting point you to learn the basics of R.



回答2:

Using the sqldf package (assuming df is your table)

library(sqldf)
sqldf("SELECT names, avg(Y1) as mean_Y1, avg(Y2) as mean_Y2 FROM df GROUP BY names")


标签: r mean