How to get summary statistics by group

2019-01-01 06:01发布

I'm trying to get multiple summary statistics in R/S-PLUS grouped by categorical column in one shot. I found couple of functions, but all of them do one statistic per call, like `aggregate().

data <- c(62, 60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66, 
          71, 67, 68, 68, 56, 62, 60, 61, 63, 64, 63, 59)
grp <- factor(rep(LETTERS[1:4], c(4,6,6,8)))
df <- data.frame(group=grp, dt=data)
mg <- aggregate(df$dt, by=df$group, FUN=mean)    
mg <- aggregate(df$dt, by=df$group, FUN=sum)    

What I'm looking for is to get multiple statistics for the same group like mean, min, max, std, ...etc in one call, is that doable?

标签: r s
9条回答
心情的温度
2楼-- · 2019-01-01 06:07

First, it depends on your version of R. If you've passed 2.11, you can use aggreggate with multiple results functions(summary, by instance, or your own function). If not, you can use the answer made by Justin.

查看更多
不再属于我。
3楼-- · 2019-01-01 06:08

I'll put in my two cents for tapply().

tapply(df$dt, df$group, summary)

You could write a custom function with the specific statistics you want to replace summary.

查看更多
明月照影归
4楼-- · 2019-01-01 06:08

Besides describeBy, the doBy package is an another option. It provides much of the functionality of SAS PROC SUMMARY. Details: http://www.statmethods.net/stats/descriptives.html

查看更多
十年一品温如言
5楼-- · 2019-01-01 06:08

I just found a wonderful R package tables. You can tabulate data by as many categories as you desire and calculate multiple statistics for multiple variables - it truly is amazing!

But wait, there's more! The package has functions to generate LaTeX code for your tables for easy import to your documents.

查看更多
初与友歌
6楼-- · 2019-01-01 06:08

after 5 long years I'm sure not much attention is going to be received for this answer, But still to make all options complete, here is the one with data.table

library(data.table)
setDT(df)[ , list(mean_gr = mean(dt), sum_gr = sum(dt)) , by = .(group)]
#   group mean_gr sum_gr
#1:     A      61    244
#2:     B      66    396
#3:     C      68    408
#4:     D      61    488 
查看更多
牵手、夕阳
7楼-- · 2019-01-01 06:14

dplyr package could be nice alternative to this problem:

library('dplyr')
df %>% group_by(group) %>% summarize(mean=mean(dt), sum=sum(dt))
查看更多
登录 后发表回答