dplyr - Get last value for each year

2020-05-25 17:50发布

I have a tbl_df that looks like this:

> d
Source: local data frame [3,703 x 3]

         date  value year
1  2001-01-01 0.1218 2001
2  2001-01-02 0.1216 2001
3  2001-01-03 0.1216 2001
4  2001-01-04 0.1214 2001
5  2001-01-05 0.1214 2001
..        ...    ...  ...

where dates range accross several years.

I would like to get the latest value of value for each year (which is not consistently the 31-12). Is there a way to do that using an idiom such as: d %>% group_by(year) %>% summarise(...)?

标签: r dplyr
1条回答
神经病院院长
2楼-- · 2020-05-25 18:16

Here are some options

library(dplyr)
d %>% 
  group_by(year) %>%
  summarise(value=last(value))

Or may be (not very clear in the description)

d %>% 
  group_by(year) %>%
  slice(which.max(date)) %>%
  select(value) 

Or

d %>%
  group_by(year) %>%
  filter(date==max(date)) %>%
  select(value)

Or we can use arrange to order the 'date' (in case it is not ordered) and get the last value

d %>%
  group_by(year) %>%
  arrange(date) %>%
  summarise(value=last(value))

In case, you want to try with data.table, here is one

library(data.table)
setDT(d)[, value[which.max(date)], year]

Or as @David Arenburg commented

 unique(setDT(d)[order(-date)], by = "year")
查看更多
登录 后发表回答