How to subset a data frame by the last day of each

2019-07-06 15:07发布

问题:

I have a df :

     dates  V1  V2  V3  V4  V5  V6  V7  V8  V9  V10
1999-05-31  66  65  64  63  62  61  60  59  58  57
1999-06-01  67  66  65  64  63  62  61  60  59  58
1999-06-02  68  67  66  65  64  63  62  61  60  59
1999-06-03  69  68  67  66  65  64  63  62  61  60
1999-06-04  70  69  68  67  66  65  64  63  62  61
1999-06-17  79  78  77  76  75  74  73  72  71  70
1999-06-18  80  79  78  77  76  75  74  73  72  71
1999-06-21  81  80  79  78  77  76  75  74  73  72
1999-06-22  82  81  80  79  78  77  76  75  74  73
1999-06-23  83  82  81  80  79  78  77  76  75  74
1999-06-24  84  83  82  81  80  79  78  77  76  75
1999-06-25  85  84  83  82  81  80  79  78  77  76
1999-06-28  86  85  84  83  82  81  80  79  78  77
1999-06-29  87  86  85  84  83  82  81  80  79  78
1999-06-30  88  87  86  85  84  83  82  81  80  79

I would like to subset the above df by the last day of each month. Namely, that only the dates 1999-05-31 and 1999-06-30 would be present. The actual data frame is much larger and the last dates might be the 28'th,29'th and so on of each month. So I would like the output to be something like:

dates   V1  V2  V3  V4  V5  V6  V7  V8  V9  V10
1999-05-31  66  65  64  63  62  61  60  59  58  57 
1999-06-30  88  87  86  85  84  83  82  81  80  79
1999-10-29  175 174 173 172 171 170 169 168 167 166

I was trying to find some function in zoo or other packages but could not find one... Greatful for all the suggestions!

回答1:

This selects the last days of the month:

df[as.numeric(substr(as.Date(df$dates) + 1, 9, 10))
   < as.numeric(substr(df$dates, 9, 10)), ]

#        dates V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
#1  1999-05-31 66 65 64 63 62 61 60 59 58  57
#15 1999-06-30 88 87 86 85 84 83 82 81 80  79

Note that this solution depends on the absolute number of months per day (irrespective of your data).

If you want to select the last day of each month in your actual data, use this command:

df[c(diff(as.numeric(substr(df$dates, 9, 10))) < 0, TRUE), ]


回答2:

Assuming dates are formatted properly as dates, and the source data frame is x.

> library(xts)
> x[endpoints(x$dates, on = "months"), ]
        dates V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1  1999-05-31 66 65 64 63 62 61 60 59 58  57
15 1999-06-30 88 87 86 85 84 83 82 81 80  79


回答3:

Here is an option using dplyr:

library(dplyr)

df %>% 
  mutate(dates = as.Date(dates)) %>% 
  mutate(yr_mnth = format(dates, '%Y-%m')) %>% 
  group_by(yr_mnth) %>% 
  filter(dates == max(dates))

# or if you wanted the first observation of each month:
df %>% 
  mutate(dates = as.Date(dates)) %>% 
  mutate(yr_mnth = format(dates, '%Y-%m')) %>% 
  group_by(yr_mnth) %>% 
  filter(dates == min(dates))


标签: r xts zoo