R aggregate data.frame with date column

2019-01-27 01:17发布

问题:

I have the data frame resambling the one in the below

Date       Expenditure Indicator
29-01-2011 5455        212
25-01-2012 5452        111
11-02-2011 365         5

I'm currently interested in summing up the Expenditure values, I'm trying to use the function below

dta.sum <- aggregate(x = dta, FUN = sum, 
                         by = list(Group.date = dta$date))

but R returns the following error, Error in Summary.Date(c(15614L, 15614L, 15614L, 15614L, 15614L, 15614L, : sum not defined for "Date" objects. The Date column was previously defined as date with use of the as.Date function. Analogous function but with the mean works fine.

dta.sum <- aggregate(x = dta, FUN = mean 
                             by = list(Group.date = dta$date))

I would like to keep date formatted as date.

回答1:

Indicate the variables you are trying to get the aggregate of in your aggregate statement, and this problem should be resolved:

dta.sum <- aggregate(x = dta[c("Expenditure","Indicator")],
                     FUN = sum,
                     by = list(Group.date = dta$Date))

EDITED TO ADD EXPLANATION: When you give the aggregate argument as just dta, aggregate attempts to apply the argument to every column. sum is not defined for date values in R, and therefore you are getting errors. You want to exclude the grouping column by using the code described above.



回答2:

Upgrade from base and use data.table instead to simplify (and speed up) your code/life:

library(data.table)

dt = as.data.table(dta)

dt[, lapply(.SD, sum), by = Date]


回答3:

Or use dplyr:

library(dplyr)

dta %>%
  group_by(Date) %>%
  summarise(Tot.Expenditure = sum(Expenditure))


回答4:

df <- data.frame(c('29-01-2011', '25-01-2012', '11-02-2011'), c(5455, 5452, 365), c(212, 211, 5))
colnames(df) <- c('Date', 'Expenditure', 'Indicator')
colSums(df[2])

#>Expenditure 
#11272