Count number of rows within each group

2018-12-31 02:40发布

I have a dataframe and I would like to count the number of rows within each group. I reguarly use the aggregate function to sum data as follows:

df2 <- aggregate(x ~ Year + Month, data = df1, sum)

Now, I would like to count observations but can't seem to find the proper argument for FUN. Intuitively, I thought it would be as follows:

df2 <- aggregate(x ~ Year + Month, data = df1, count)

But, no such luck.

Any ideas?


Some toy data:

set.seed(2)
df1 <- data.frame(x = 1:20,
                  Year = sample(2012:2014, 20, replace = TRUE),
                  Month = sample(month.abb[1:3], 20, replace = TRUE))

12条回答
谁念西风独自凉
2楼-- · 2018-12-31 03:26

Following @Joshua's suggestion, here's one way you might count the number of observations in your df dataframe where Year = 2007 and Month = Nov (assuming they are columns):

nrow(df[,df$YEAR == 2007 & df$Month == "Nov"])

and with aggregate, following @GregSnow:

aggregate(x ~ Year + Month, data = df, FUN = length)
查看更多
牵手、夕阳
3楼-- · 2018-12-31 03:28

An alternative to the aggregate() function in this case would be table() with as.data.frame(), which would also indicate which combinations of Year and Month are associated with zero occurrences

df<-data.frame(x=rep(1:6,rep(c(1,2,3),2)),year=1993:2004,month=c(1,1:11))

myAns<-as.data.frame(table(df[,c("year","month")]))

And without the zero-occurring combinations

myAns[which(myAns$Freq>0),]
查看更多
骚的不知所云
4楼-- · 2018-12-31 03:31

Considering @Ben answer, R would throw an error if df1 does not contain x column. But it can be solved elegantly with paste:

aggregate(paste(Year, Month) ~ Year + Month, data = df1, FUN = NROW)

Similarly, it can be generalized if more than two variables are used in grouping:

aggregate(paste(Year, Month, Day) ~ Year + Month + Day, data = df1, FUN = NROW)
查看更多
ら面具成の殇う
5楼-- · 2018-12-31 03:33

The simple option to use with aggregate is the length function which will give you the length of the vector in the subset. Sometimes a little more robust is to use function(x) sum( !is.na(x) ).

查看更多
深知你不懂我心
6楼-- · 2018-12-31 03:33
lw<- function(){length(which(df$variable==someValue))}

agg<- aggregate(Var1~Var2+Var3, data=df, FUN=lw)

names(agg)<- c("Some", "Pretty", "Names", "Here")

View(agg)
查看更多
素衣白纱
7楼-- · 2018-12-31 03:36

A solution using sqldf package:

library(sqldf)
sqldf("SELECT Year, Month, COUNT(*) as Freq
       FROM df1
       GROUP BY Year, Month")
查看更多
登录 后发表回答