how to sum data by month and store them in the fir

2019-08-08 21:12发布

Lets say I have a data frame as follows:

gageID    date        flow_cms
1011000 1937-02-19   25.768334
1011000 1937-02-20   24.918828
1011000 1937-02-21   24.069322

I want to aggregate the rows that have the same month summing the flow, and store the result into a new data value: the first day of every month; in order to obtain the following output:

gageID  date  sum_monthly_flow
1011000 1937-02-01  500.2222
1011000 1937-03-01  589.222

I'm using this line:

>rowsum(qfile$flow_cms, format(qfile$date, '%Y-%m-01'))

and I obtain the right sum, but I want also to reduce the record days in a unique day: the first of every month! with the strip shows above, R cannot recognize the left coloumn as data (or date).

Help would be very much appreciated!

标签: r date aggregate
3条回答
唯我独甜
2楼-- · 2019-08-08 21:21

Another solution:

df
   gageID       date  flow_cms
1 1011000 1937-02-19  25.76833
2 1011000 1937-02-20  24.91883
3 1011000 1937-02-21  24.06932
4 1011000 1937-03-20  12.22200
5 1011000 1937-03-01 589.22200

df1 = aggregate(list(flow_cms=df$flow_cms),by=list(dates=cut(as.POSIXct(df$date),"month")),sum)

df1
       dates  flow_cms
1 1937-02-01  74.75648
2 1937-03-01 601.44400
查看更多
Emotional °昔
3楼-- · 2019-08-08 21:23

Using data.table and lubridate you can try:

library(data.table)
setDT(qfile)[, lapply(.SD, sum), by = .(gageID, date = lubridate::floor_date(date, "month"))]
    gageID       date  flow_cms
1: 1011000 1937-02-01  74.75648
2: 1011000 1937-03-01 601.44400

Note that it is assumed that date is already of class Date and that gageID is another grouping parameter.

Alternatively, the call to a function from another package can be avoided using data.table's own mday() function (thanks to @Henrik):

setDT(qfile)[, lapply(.SD, sum), by = .(gageID, date = date - mday(date) + 1)] 

Data

Here, Abdou's sample data is used:

qfile <- structure(list(gageID = c(1011000L, 1011000L, 1011000L, 1011000L, 
1011000L), date = structure(c(-12004, -12003, -12002, -11975, 
-11994), class = "Date"), flow_cms = c(25.76833, 24.91883, 24.06932, 
12.222, 589.222)), .Names = c("gageID", "date", "flow_cms"), row.names = c(NA, 
-5L), class = "data.frame")
查看更多
萌系小妹纸
4楼-- · 2019-08-08 21:31

First make sure your "date" column is properly formatted as a date object in R:

qfile$date <- as.Date(qfile$date, format = "%Y-%m-%d")

Then we can use format to extract the month and year, and group_by that for a sum and take the first date:

library(dplyr)
qfile %>% mutate(monthyear = as.character(format(date, "%m-%Y"))) %>%
          arrange(date) %>% 
          group_by(monthyear) %>%
          summarise(date=date[1], flow = sum(flow_cms))

This will give you the first taken record for each month in the data.

查看更多
登录 后发表回答