From a data frame, is there a easy way to aggregate (sum
, mean
, max
et c) multiple variables simultaneously?
Below are some sample data:
library(lubridate)
days = 365*2
date = seq(as.Date("2000-01-01"), length = days, by = "day")
year = year(date)
month = month(date)
x1 = cumsum(rnorm(days, 0.05))
x2 = cumsum(rnorm(days, 0.05))
df1 = data.frame(date, year, month, x1, x2)
I would like to simultaneously aggregate the x1
and x2
variables from the df2
data frame by year and month. The following code aggregates the x1
variable, but is it also possible to simultaneously aggregate the x2
variable?
### aggregate variables by year month
df2=aggregate(x1 ~ year+month, data=df1, sum, na.rm=TRUE)
head(df2)
Any suggestions would be greatly appreciated.
Where is this
year()
function from?You could also use the
reshape2
package for this task:Using the
data.table
package, which is fast (useful for larger datasets)https://github.com/Rdatatable/data.table/wiki
Using the plyr package
Using summarize() from the Hmisc package (column headings are messy in my example though)
Yes, in your
formula
, you cancbind
the numeric variables to be aggregated:See
?aggregate
, theformula
argument and the examples.Late to the party, but recently found another way to get the summary statistics.
library(psych) describe(data)
Will output: mean, min, max, standard deviation, n, standard error, kurtosis, skewness, median, and range for each variable.
Interestingly, base R
aggregate
'sdata.frame
method is not showcased here, above the formula interface is used, so for completeness:More generic use of aggregate's data.frame method:
Since we are providing a
data.frame
asx
andlist
(data.frame
is also alist
) asby
, this is very useful if we need to use it in a dynamic manner, e.g. using other columns to be aggregated and to aggregate by is very simpleFor example like so:
With the
dplyr
package, you can usesummarise_all
,summarise_at
orsummarise_if
functions to aggregate multiple variables simultaneously. For the example dataset you can do this as follows:The result of the latter two options:
Note:
summarise_each
is deprecated in favor ofsummarise_all
,summarise_at
andsummarise_if
.As mentioned in my comment above, you can also use the
recast
function from thereshape2
-package:which will give you the same result.