I have a dataframe and I would like to count the number of rows within each group. I reguarly use the aggregate
function to sum data as follows:
df2 <- aggregate(x ~ Year + Month, data = df1, sum)
Now, I would like to count observations but can't seem to find the proper argument for FUN
. Intuitively, I thought it would be as follows:
df2 <- aggregate(x ~ Year + Month, data = df1, count)
But, no such luck.
Any ideas?
Some toy data:
set.seed(2)
df1 <- data.frame(x = 1:20,
Year = sample(2012:2014, 20, replace = TRUE),
Month = sample(month.abb[1:3], 20, replace = TRUE))
Following @Joshua's suggestion, here's one way you might count the number of observations in your
df
dataframe whereYear
= 2007 andMonth
= Nov (assuming they are columns):and with
aggregate
, following @GregSnow:An alternative to the
aggregate()
function in this case would betable()
withas.data.frame()
, which would also indicate which combinations of Year and Month are associated with zero occurrencesAnd without the zero-occurring combinations
Considering @Ben answer, R would throw an error if
df1
does not containx
column. But it can be solved elegantly withpaste
:Similarly, it can be generalized if more than two variables are used in grouping:
The simple option to use with
aggregate
is thelength
function which will give you the length of the vector in the subset. Sometimes a little more robust is to usefunction(x) sum( !is.na(x) )
.A sql solution using
sqldf
package: