How to sum a variable by group?

2018-12-30 22:23发布

Let's say I have two columns of data. The first contains categories such as "First", "Second", "Third", etc. The second has numbers which represent the number of times I saw "First".

For example:

Category     Frequency
First        10
First        15
First        5
Second       2
Third        14
Third        20
Second       3

I want to sort the data by Category and sum the Frequencies:

Category     Frequency
First        30
Second       5
Third        34

How would I do this in R?

标签: r sorting r-faq
13条回答
人气声优
2楼-- · 2018-12-30 22:58

Using aggregate:

aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
  Category  x
1    First 30
2   Second  5
3    Third 34

In the example above, multiple dimensions can be specified in the list. Multiple aggregated metrics of the same data type can be incorporated via cbind:

aggregate(cbind(x$Frequency, x$Metric2, x$Metric3) ...

(embedding @thelatemail comment), aggregate has a formula interface too

aggregate(Frequency ~ Category, x, sum)

Or if you want to aggregate multiple columns, you could use the . notation (works for one column too)

aggregate(. ~ Category, x, sum)

or tapply:

tapply(x$Frequency, x$Category, FUN=sum)
 First Second  Third 
    30      5     34 

Using this data:

x <- data.frame(Category=factor(c("First", "First", "First", "Second",
                                      "Third", "Third", "Second")), 
                    Frequency=c(10,15,5,2,14,20,3))
查看更多
高级女魔头
3楼-- · 2018-12-30 22:58

You could use the function group.sum from package Rfast.

Category <- Rfast::as_integer(Category,result.sort=FALSE) # convert character to numeric. R's as.numeric produce NAs.
result <- Rfast::group.sum(Frequency,Category)
names(result) <- Rfast::Sort(unique(Category)
# 30 5 34

Rfast has many group functions and group.sum is one of them.

查看更多
不再属于我。
4楼-- · 2018-12-30 23:00

Just to add a third option:

require(doBy)
summaryBy(Frequency~Category, data=yourdataframe, FUN=sum)

EDIT: this is a very old answer. Now I would recommend the use of group_by and summarise from dplyr, as in @docendo answer.

查看更多
伤终究还是伤i
5楼-- · 2018-12-30 23:02

Several years later, just to add another simple base R solution that isn't present here for some reason- xtabs

xtabs(Frequency ~ Category, df)
# Category
# First Second  Third 
#    30      5     34 

Or if want a data.frame back

as.data.frame(xtabs(Frequency ~ Category, df))
#   Category Freq
# 1    First   30
# 2   Second    5
# 3    Third   34
查看更多
爱死公子算了
6楼-- · 2018-12-30 23:03
library(plyr)
ddply(tbl, .(Category), summarise, sum = sum(Frequency))
查看更多
裙下三千臣
7楼-- · 2018-12-30 23:08

using cast instead of recast (note 'Frequency' is now 'value')

df  <- data.frame(Category = c("First","First","First","Second","Third","Third","Second")
                  , value = c(10,15,5,2,14,20,3))

install.packages("reshape")

result<-cast(df, Category ~ . ,fun.aggregate=sum)

to get:

Category (all)
First     30
Second    5
Third     34
查看更多
登录 后发表回答