I have a data file as following example but much more larger
names num Y1 Y2
William 1 4.71 7.4
William 2 3.75 8
William 3 4.71 7.9
Katja 1 5.83 8.5
Katja 2 5.17 7.1
Katja 3 6.08 7.4
Aroma 1 4.04 7.5
Aroma 2 5 6.9
Aroma 3 4.3 7.9
...
I have to calculate the mean for each 3 of the same names (first column) for Y1 and Y2. And then make a bar chart by the average of each name with Y1 and Y2, separately. So on the x axis I will have the names and on the y axis the mean. Could anybody help me with this?
You can also use aggregate
. See ?aggregate
for further details.
> aggregate(.~names, FUN=mean, data=df[, -2])
names Y1 Y2
1 Aroma 4.446667 7.433333
2 Katja 5.693333 7.666667
3 William 4.390000 7.766667
Take a look at this post for another alternatives of taking mean for each group.
For the bar plots use R base barplot
function although there other alternatives such as ggplot2 graphics.
barplot(DF[,2], names.arg=DF$names, ylab="mean of Y1", las=1) # for Y1
barplot(DF[,3], names.arg=DF$names, ylab="mean of Y2", las=1) # for Y2
which produce:
As you are very new to R, I recommend to read An introduction to R which is a good starting point you to learn the basics of R.
Using the sqldf
package (assuming df
is your table)
library(sqldf)
sqldf("SELECT names, avg(Y1) as mean_Y1, avg(Y2) as mean_Y2 FROM df GROUP BY names")