How can I get xtabs to calculate means instead of

2019-02-06 18:09发布

问题:

I have a data frame where each line represents an individual. That data frame has two variables: age and year. I want to make a table of average ages per year. How can I do it?

The best I could come up with was xtabs(age ~ year, dataframe), but this gives me the sum of ages per year.

回答1:

Use aggregate:

xtabs(hp~cyl+gear,aggregate(hp~cyl+gear,mtcars,mean))
   gear
cyl        3        4        5
  4  97.0000  76.0000 102.0000
  6 107.5000 116.5000 175.0000
  8 194.1667   0.0000 299.5000


回答2:

Have a look at the plyr package, specifically ddply

ddply(dataframe, .(year), summarise, mean(age))

If you have to use xtabs... see the above answer.



回答3:

I'm glad that xtabs solution works for you. I must have a package loaded that interferes with it for me (and gives an error). Another solution would be:

tapply(dfrm$age, dfrm$year, FUN=mean)

To get additional dimensions to the table (array) just keep adding additional factors to the second INDEX argument in the form list(fac1, fac2, fac3).

Applied to the example using mtcars:

 tapply(mtcars$hp, list(mtcars$cyl,mtcars$gear), mean)
         3     4     5
4  97.0000  76.0 102.0
6 107.5000 116.5 175.0
8 194.1667    NA 299.5

Or even more compactly:

with(mtcars, tapply(hp, list(cyl, gear), mean))


回答4:

Another solution is to use 2 xtabs functions, first one to calculate the sum of the ages, 2nd one to get the headcount for each group, then get the mean by sum/headcount.

E.g., xtabs(age ~ year, dataframe)/xtabs( ~ year, dataframe)

However, the aggregate approach can be used to calculate median or other functions applied to the dataframe. Definitely more flexible.



标签: r average mean