complete time series by group in r

2020-05-07 19:23发布

问题:

I have a dataframe

dat <- data.frame(c("G", "G", "G", "G"), c("G1", "G1", "G2", "G2"), c('2017-01-01', '2017-01-03', '2017-04-02', '2017-04-05'))

colnames(dat) <- c('Country', 'Place', 'date')

I would like to have this output: (complete date for each (country-place) group)

dat <- data.frame(c("G", "G", "G", "G", "G", "G", "G"),
                  c("G1","G1", "G1", "G2", "G2", "G2", "G2"), 
                  c('2017-01-01', '2017-01-03','2017-01-03', 
                    '2017-04-02', '2017-04-03', '2017-04-04', '2017-04-05'))

I have tried:

dat = dat %>% group_by(Country, Place) %>% complete(date)

but it does not work. Can anyone help me with this?

回答1:

You can do:

dat %>%
  mutate(date = as.Date(date)) %>%
  group_by(Country, Place) %>%
  complete(date = seq.Date(min(date), max(date) , by= "day"))


# A tibble: 7 x 3
# Groups:   Country, Place [2]
  Country Place date      
  <fct>   <fct> <date>    
1 G       G1    2017-01-01
2 G       G1    2017-01-02
3 G       G1    2017-01-03
4 G       G2    2017-04-02
5 G       G2    2017-04-03
6 G       G2    2017-04-04
7 G       G2    2017-04-05


回答2:

You may do it this way as well:

library(tidyverse)

group_by(dat, Country, Place) %>% 
  expand(date = full_seq(as.Date(date), 1)) %>% 
  ungroup()

# # A tibble: 7 x 3
#   Country Place date      
#   <fct>   <fct> <date>    
# 1 G       G1    2017-01-01
# 2 G       G1    2017-01-02
# 3 G       G1    2017-01-03
# 4 G       G2    2017-04-02
# 5 G       G2    2017-04-03
# 6 G       G2    2017-04-04
# 7 G       G2    2017-04-05