How to add only missing Dates in Dataframe R

2020-04-06 07:35发布

问题:

I have below mentioned data frame:

Date        Val1     Val2
2018-04-01  125      0.05
2018-04-03  458      2.99
2018-04-05  354      1.25

I want to add only missing dates considering Sys.Date() (Here for example Sys.Date() is 2018-04-06) in dataframe with corresponding val1 and val2 as 0.

I have tried: t2<-merge(data.frame(Date= seq(min(ymd(t1$Date)), max(ymd(date)), by = "days")), t1, by = "Date", all = TRUE)

Required Dataframe:

Date        Val1     Val2
2018-04-01  125      0.05
2018-04-02  0        0
2018-04-03  458      2.99
2018-04-04  0        0
2018-04-05  354      1.25
2018-04-06  0        0

回答1:

Here's a correction of your approach, in base R.

Replace max(t1$Date) bySys.Date() in your real application:

t2<-merge(data.frame(Date= as.Date(min(t1$Date):max(t1$Date),"1970-1-1")),
          t1, by = "Date", all = TRUE)
t2[is.na(t2)] <- 0

#         Date Val1 Val2
# 1 2018-04-01  125 0.05
# 2 2018-04-02    0 0.00
# 3 2018-04-03  458 2.99
# 4 2018-04-04    0 0.00
# 5 2018-04-05  354 1.25

data

t1 <- read.table(text="Date        Val1     Val2
'2018-04-01'  125 0.05
'2018-04-03'  458 2.99
'2018-04-05'  354 1.25",h=T,strin=F)
t1$Date <- as.Date(df$Date)


回答2:

This could be done with complete

library(tidyverse)
df1 %>%
    complete(Date = seq(Date[1], Sys.Date(), by = "1 day"),
                fill = list(Val1 = 0, Val2 = 0))

If we need to pass multiple variables for the fill, create the list of columns that we need to fill

nm1 <- setdiff(names(df1), "Date") #in this example excluding the Date
nm2 <- setNames(as.list(rep(0, length(nm1))), nm1)

and then pass that as argument for the fill

df1 %>% 
     complete(Date = seq(Date[1], Sys.Date(), by = "1 day"), fill = nm2)
# A tibble: 35 x 3
#   Date        Val1  Val2
#   <date>     <dbl> <dbl>
# 1 2018-04-01   125  0.05
# 2 2018-04-02     0  0   
# 3 2018-04-03   458  2.99
# 4 2018-04-04     0  0   
# 5 2018-04-05   354  1.25
# 6 2018-04-06     0  0   
# 7 2018-04-07     0  0   
# 8 2018-04-08     0  0   
# 9 2018-04-09     0  0   
#10 2018-04-10     0  0   
# ... with 25 more rows


回答3:

You could use padr. padr is made for filling in missing date values. First you add the missing dates based on the interval, and if you do not want NA's you fill them with a value (or function of most occuring value)

edit: added end_val to include the run until sys.Date()

library(padr)
# Specify end_val to go all the way to sys.Date and add 1 to include sys.Date
padded_df <- pad(df, interval = "day", end_val = Sys.Date()+1)
padded_df <- fill_by_value(padded_df, value = 0)
padded_df

        Date Val1 Val2
1 2018-04-01  125 0.05
2 2018-04-02    0 0.00
3 2018-04-03  458 2.99
4 2018-04-04    0 0.00
5 2018-04-05  354 1.25
.....

31 2018-05-01    0    0
32 2018-05-02    0    0
33 2018-05-03    0    0
34 2018-05-04    0    0
35 2018-05-05    0    0
36 2018-05-06    0    0