I have a data frame where each line represents an individual. That data frame has two variables: age and year. I want to make a table of average ages per year. How can I do it?
The best I could come up with was xtabs(age ~ year, dataframe)
, but this gives me the sum of ages per year.
Use aggregate
:
xtabs(hp~cyl+gear,aggregate(hp~cyl+gear,mtcars,mean))
gear
cyl 3 4 5
4 97.0000 76.0000 102.0000
6 107.5000 116.5000 175.0000
8 194.1667 0.0000 299.5000
Have a look at the plyr package, specifically ddply
ddply(dataframe, .(year), summarise, mean(age))
If you have to use xtabs... see the above answer.
I'm glad that xtabs
solution works for you. I must have a package loaded that interferes with it for me (and gives an error). Another solution would be:
tapply(dfrm$age, dfrm$year, FUN=mean)
To get additional dimensions to the table (array) just keep adding additional factors to the second INDEX argument in the form list(fac1, fac2, fac3)
.
Applied to the example using mtcars:
tapply(mtcars$hp, list(mtcars$cyl,mtcars$gear), mean)
3 4 5
4 97.0000 76.0 102.0
6 107.5000 116.5 175.0
8 194.1667 NA 299.5
Or even more compactly:
with(mtcars, tapply(hp, list(cyl, gear), mean))
Another solution is to use 2 xtabs functions, first one to calculate the sum of the ages, 2nd one to get the headcount for each group, then get the mean by sum/headcount.
E.g.,
xtabs(age ~ year, dataframe)/xtabs( ~ year, dataframe)
However, the aggregate approach can be used to calculate median or other functions applied to the dataframe. Definitely more flexible.